Tuesday, March 3, 2020

7 Types Of Artificial Neural Networks In Linguistic Data Processing

An Application of Artificial Neural Networks in Linguistics

AI in ANN


In this article, we try to answer the following questions: What is meant by an artificial neural network? How does it work What are the types of artificial neural networks? How do you use them when processing natural languages?
ANN ( Artificial Neural Network ) is a nonlinear computer model based on the neural structure of the brain, which is able to solve tasks such as classification, prediction, decision-making, visualization and more using examples.
An artificial neural network contains artificial neurons or processing elements and can be represented as a structure consisting of three interconnected layers: input, hidden layer, which can consist of several layers, and output.
The input layer consists of input neurons, thanks to which the information is sent to the hidden layer. The hidden layer in turn provides data to the output layer. Each neuron consists of three parts: weighted inputs ( synapses ), an activation function (determines the output that influences the input) and an output. Synapses are the adjustable parameters whose task is to convert a neural network into a parameterized system. The activation function determines the output that influences the input.


Artificial neuron with four inputs
The activation signal is generated from the weighted sum of the inputs and passed on to the activation function. In this way, an output is created from the neuron. A distinction is made between the following types of activation functions: linear, staircase, sigmoid, tan and rectified linear units (ReLu).
Linear function
f (x) = ax


Stair function




Logistic (sigmoid) function




Tan function




Function of rectified linear units (ReLu)




Training is the process of weight optimization. It is to minimize the error of the predictions and to give the network a certain level of accuracy. The most popular method for determining the error contribution of each neuron is called back propagation . It calculates the gradient of the loss function.
It is possible to make the system more flexible and powerful by using more hidden layers. DNN (Deep Neural Network) is the name for an artificial neural network with many hidden layers between the input and output layers that can model complex non-linear relationships.

1. Multilayer Perceptron (MLP)

A multilayer perceptron (MLP) contains three or more layers. A non-linear activation function (especially hyperbolic tangent or logistics function) is used to classify the non-linearly separable data. Since each node in one layer is connected to corresponding nodes in the following layer, the network is completely connected. Speech recognition and machine translation are two important areas of application for the multilayer perceptron in linguistic data processing (LDV).

2. Convolutional neural network (CNN)

A folding neural network consists of one or more folding layers that are joined together or fully connected. It uses a variation of the multi-layer perceptron described above. Convolutional layers use an input convolution operation to pass the result to the next layer. This operation allows the network to work deeper with fewer parameters.
Folding neural networks show excellent results in image and speech applications. Yoon Kim in his work " Convolutional Neural Networks for Sentence Classification " looks at the process and the results of the tasks of text classification when using folding neural networks [1] . Based on word2vec , he creates a model, carries out a series of experiments with it and tests it with some reference values ​​to prove that the model works excellently. Folding neural networks can achieve extraordinary performance without knowledge of words, phrases, sentences and other syntactic or semantic structures related to a human language.
This was demonstrated by Xiang Zhang and Yann LeCun in “ Text Understanding from Scratch ” [2] . The main applications of the folding neural network are semantic parsing [3] , discovery of paraphrases [4] and speech recognition [5] .

3. Recursive Neural Network (RNN)

A recursive neural network (RNN) is a type of deep neural network that is formed recursively across a structure using the same set of weights. Its task is to make a structured prediction about input structures with variable size or a scalar prediction about it by traversing a certain structure in topological order [6] . A non-linearity like Tanh and a weight matrix that is distributed throughout the network are used in the simplest architecture to connect the nodes in parents.

4. Recurrent Neural Network (RNN)

A recurrent neural network (RNN) is in contrast to a feedforward neural network . It is considered to be a variant of a recursive artificial neural network in which a directed cycle is formed by the connections between neurons. This means that the output is not only dependent on the current inputs, but also on the neuron status of the previous step. This memory makes it easier for users to solve problems in the field of computational linguistics such as handwriting recognition or speech recognition. In the article “ Natural Language Generation, Paraphrasing and Summarization of User Reviews with Recurrent Neural Networks“The authors present a model of the recurrent neural network that can generate new sentences and document summaries [7] .
Siwei Lai, Liheng Xu, Kang Liu and Jun Zhao created and described in " Recurrent Convolutional Neural Networks for Text Classification " a recurrent folding neural network for text classification without human-designed features. Their model was compared to existing text classification methods such as Bag of Words, Bigrams + LR, SVM, LDA, Tree Kernels, Recursive Neural Network and CNN. One can see that their model surpasses the traditional methods for all data sets used [8] .

5. Long short-term memory (LSTM)

Long Short-Term Memory (LSTM) is a specific architecture of the recurrent neural network. It makes it easier to model temporal sequences and their far-reaching dependencies more precisely than conventional RNNs [9] . LSTM has the following special features: It does not use an activation function within its recurrent components, the stored values ​​are not changed and the gradient does not tend to disappear during training. LSTM units are usually implemented in “blocks” with different units. These blocks contain three or four “gates” (e.g. entrance gate, forget gate, exit gate) that control the flow of information with regard to the logistical function.
Hasim Sak, Andrew Senior and Franรงoise Beaufays demonstrated in " Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling " that the deep LSTM-RNN architectures achieve a highly developed performance for extensive acoustic modeling.
Peilu Wang, Yao Qian, Frank K. Soong, Lei He and Hai Zhao presented in their research “ Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network ” a model for tagging / recognition of parts of speech [10] . The model achieved 97.40% of the detection accuracy. LSTM has been integrated into their products as a fundamental element by Apple, Amazon, Google, Microsoft and other companies.

6. Sequence-to-sequence models

Typically, a sequence-to-sequence model contains two recurrent neural networks: an encoder that produces the input and a decoder that generates the output. You can use the same or different parameter sets.
The main areas of application for this model are question-answer systems, chatbots and machine translations. Such multilayer cells have been successfully used in sequence-to-sequence models for translation, which was considered in “ Sequence to Sequence Learning with Neural Networks study ” [11] .
Paraphrase Detection Using Recursive Autoencoder " describes a new recursive autoencoder architecture. Their peculiarity is that the representations are the vectors in an n-dimensional semantic space in which phrases with similar meanings are close together [12] .

7. Flat neural networks

Just like deep neural networks, flat models can be counted among popular and useful tools. For example, word2vec is a group of flat two-layer models used to create word embeds . Word2vec, which was presented in " Efficient Estimation of Word Representations in Vector Space ", takes a large body of text as its input and creates a vector space [13] . Every word in the corpus gets the corresponding vector in this room. The peculiarity of this model is that words from common contexts are close together in the body in the vector space.

Summary

Different variants of artificial neural networks were described in this article. These are the multilayer perceptron (MLP), convolutional neural network (CNN) / folding neural network, recursive neural network (RNN), recurrent neural network (RNN), long short-term memory (LSTM), sequence-to-sequence model and flat neural networks with word2vec for word embedding. It was considered how these networks work and how different types of them are used in linguistic data processing. The difference in the use of different types of networks was found: folding neural networks are mainly used for text classification tasks, whereas recurrent neural networks are often used for natural language generation or machine translation.
If you would like to learn more about the differences in the use of artificial neural networks in linguistic data processing, please feel free to contact the AI-United.de team by email or Q&A.
Resources
  1. http://www.aclweb.org/anthology/D14-1181
  2. https://arxiv.org/pdf/1502.01710.pdf
  3. http://www.aclweb.org/anthology/P15-1128
  4. https://www.aclweb.org/anthology/K15-1013
  5. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CNN_ASLPTrans2-14.pdf
  6. https://en.wikipedia.org/wiki/Recursive_neural_network
  7. http://www.meanotek.ru/files/TarasovDS(2)2015-Dialogue.pdf
  8. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9745/9552
  9. https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenTerm1201415/sak2.pdf
  10. https://arxiv.org/pdf/1510.06168.pdf
  11. https://arxiv.org/pdf/1409.3215.pdf
  12. https://nlp.stanford.edu/courses/cs224n/2011/reports/ehhuang.pdf
  13. https://arxiv.org/pdf/1301.3781.pdf

Popular Posts