BERT (==Bidirectional Encoder Representations from Transformer==) is used in NLPprocessing, developed by Google.
Introduced in the paper “BERT Pretraining of Deep Bidirectional Transformers for Language Understanding” in 2018.
It is forward & backward looking in the context.
BERT is a stack of encoders -learning context.
Input embedding:
- Positional Encoding: passes location info to encoder
- Sentence embeddings: differences between sentences
- Token embeddings
Training of BERT:
- Masked Language modelling (hiding words)
- Next Sentence Prediction
Fine tuning (Transfer Learning) BERT model:
- New output layer dependent
Resources:
What is BERT?
-
BERT is based on the Transformer architecture and utilizes a bidirectional approach, meaning it considers the context of a word based on both its left and right surroundings in a sentence. This allows BERT to capture nuanced meanings and relationships between words more effectively than unidirectional models
-
Pre-training and Fine-tuning/Transfer Learning techniques. It learns to predict masked words in sentences (Masked Language Model) and to determine if one sentence follows another (Next Sentence Prediction).
What is BERT Used For?
-
Text Classification: Assigning categories or labels to text documents, such as sentiment analysis or topic classification.
-
Named Entity Recognition (NER): Identifying and classifying entities (e.g., names, organizations, locations) within text.
-
Question Answering: Providing answers to questions based on a given context or passage of text.
-
Text Summarisation: Generating concise summaries of longer documents while retaining key information.
-
Language Translation: Assisting in translating text from one language to another.
-
Sentence Similarity :Measuring the similarity between sentences, which can be useful for tasks like paraphrase detection or duplicate question identification.