SlideShare a Scribd company logo
A Neural Probabilistic
Language Model
Learning a distributed representation for words
2018.08.09 Soo
Contents
● Fundamental Problem of Language Modeling
● Statistical Model of Language
● Neural Probabilistic Language Model
● Result
Fundamental Problem of Language Modeling
● Curse of Dimensionality
○ As the number of features or dimensions grows, the amount of data that we need to generalize
accurately exponentially
Number of free parameters
when using discrete spaces
to modeling languages
Use Continuous Space !!
Statistical Model of Language
● The conditional probability of the next word given all the previous ones
● The reason of approximation:
○ Temporally closer words in the word sequence are statistically more dependent → N-gram Models
Statistical Model of Language
● N-gram models examples
○ How likely is “University” given “New York”?
○ Count all “New York University”
○ Count all “New York ?”: e.g., “New York State”, “New York City”, “New York Fire”, “New York Police”,
“New York Bridges”, …
○ How often “New York University” happens among these?
Problems in Statistical Model of Language
● A new combination of n words appears that was not seen in the training corpus
● Solutions
○ back-off trigram models (Katz, 1987)
○ smoothed(or interpolated) trigram model (Jelinek and Mercer, 1980)
● Limits
○ Long-term dependency: no farther than 1 or 2 words
○ no measure of similarity
Neural Probabilistic Language Model
● Abstract:
○ Associate with each word in the vocabulary a distributed word feature vector (a real valued vector
in
○ express the joint probability function of word sequences in terms of the feature vectors of these
words in the sequence
○ learn simultaneously the word feature vectors and the parameters of that probability function
Neural Probabilistic Language Model
1. Word Feature Vector
○ associated with a point in a vector space (embed word into a vector space)
○ Dimension of vector m is much more smaller than the vocabulary size ( V )
2. Probability function
○ product of conditional probabilities → maximization of log-likelihood
A sentence is a Sequence :
Object:
Two Parts
Object:
1. Distributed feature vectors
(= Embedding Layer)
Practically, C is a matrix that a row vector
in C represents a feature vectors.
Also, C is shared across all the words in
context.
Object:
2. Probability functions (g)
(= Two layer Neural Network)
Context vectors → Concatenate →
Linear (Affine) → Tanh →
Linear(Affine) → Softmax
Process:
Sizes:
- |V|: vocab size
- h: hidden size
- m: embed size
Loss:
● f : Neural Network
● R: Regularization (=weight decay)
Parameters:
Total number of free parameters:
- h: hidden size
- m: embed size
Sizes:
- |V|: vocab size
- h: hidden size
- m: embed size
Test Measurement: Perplexity
A measurement of how well a probability
distribution or probability model (q) predicts a
sample
Lower Perplexity means the model q fits better to
generate training data. Means that model is less
surprised by the test sample.
Why geometric averages of inverse probability?
Result
References
● A Neural Probabilistic Language Model - Yoshua Bengio, 2003
blog: https://simonjisu.github.io
github: https://github.com/simonjisu

More Related Content

What's hot

Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
Dr Sandeep Kumar Poonia
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
Krish_ver2
 
Thuật toán K mean
Thuật toán K meanThuật toán K mean
Thuật toán K mean
Haokillboom Aăâ
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
Krish_ver2
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
Pradip Kumar
 
Rang buoc toan ven
Rang buoc toan venRang buoc toan ven
Rang buoc toan venPhùng Duy
 
Bài giảng công nghệ phần mềm PTIT
Bài giảng công nghệ phần mềm PTITBài giảng công nghệ phần mềm PTIT
Bài giảng công nghệ phần mềm PTIT
NguynMinh294
 
Slide thuyet trinh android
Slide thuyet trinh androidSlide thuyet trinh android
Slide thuyet trinh androidkuto92love
 
Báo cáo Quản lý dự án phần mềm PTIT
Báo cáo Quản lý dự án phần mềm PTITBáo cáo Quản lý dự án phần mềm PTIT
Báo cáo Quản lý dự án phần mềm PTIT
Popping Khiem - Funky Dance Crew PTIT
 
CONG NGHE PHAN MEM
CONG NGHE PHAN MEMCONG NGHE PHAN MEM
CONG NGHE PHAN MEM
duc phong
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
Dr. C.V. Suresh Babu
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
moni sindhu
 
Software Engineering - chp4- design patterns
Software Engineering - chp4- design patternsSoftware Engineering - chp4- design patterns
Software Engineering - chp4- design patterns
Lilia Sfaxi
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1
sakthyvel3
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecture
nupurmakhija1211
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
Dr. Mahendra Srivastava
 
R programming advantages and disadvantages
R programming advantages and disadvantagesR programming advantages and disadvantages
R programming advantages and disadvantages
PrwaTech
 
BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019
BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019
BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019
TiLiu5
 
Luận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAY
Luận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAYLuận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAY
Luận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAY
Dịch vụ viết bài trọn gói ZALO 0917193864
 
Lý thuyết tính toán - BKHN - 5
Lý thuyết tính toán - BKHN - 5Lý thuyết tính toán - BKHN - 5
Lý thuyết tính toán - BKHN - 5Minh Lê
 

What's hot (20)

Processes and Processors in Distributed Systems
Processes and Processors in Distributed SystemsProcesses and Processors in Distributed Systems
Processes and Processors in Distributed Systems
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Thuật toán K mean
Thuật toán K meanThuật toán K mean
Thuật toán K mean
 
4.3 multimedia datamining
4.3 multimedia datamining4.3 multimedia datamining
4.3 multimedia datamining
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Rang buoc toan ven
Rang buoc toan venRang buoc toan ven
Rang buoc toan ven
 
Bài giảng công nghệ phần mềm PTIT
Bài giảng công nghệ phần mềm PTITBài giảng công nghệ phần mềm PTIT
Bài giảng công nghệ phần mềm PTIT
 
Slide thuyet trinh android
Slide thuyet trinh androidSlide thuyet trinh android
Slide thuyet trinh android
 
Báo cáo Quản lý dự án phần mềm PTIT
Báo cáo Quản lý dự án phần mềm PTITBáo cáo Quản lý dự án phần mềm PTIT
Báo cáo Quản lý dự án phần mềm PTIT
 
CONG NGHE PHAN MEM
CONG NGHE PHAN MEMCONG NGHE PHAN MEM
CONG NGHE PHAN MEM
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Software Engineering - chp4- design patterns
Software Engineering - chp4- design patternsSoftware Engineering - chp4- design patterns
Software Engineering - chp4- design patterns
 
Dm from databases perspective u 1
Dm from databases perspective u 1Dm from databases perspective u 1
Dm from databases perspective u 1
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecture
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
R programming advantages and disadvantages
R programming advantages and disadvantagesR programming advantages and disadvantages
R programming advantages and disadvantages
 
BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019
BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019
BÀI GIẢNG THIẾT KẾ, XÂY DỰNG MẠNG_10433312092019
 
Luận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAY
Luận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAYLuận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAY
Luận văn: Nhận dạng và phân loại hoa quả trong ảnh màu, HAY
 
Lý thuyết tính toán - BKHN - 5
Lý thuyết tính toán - BKHN - 5Lý thuyết tính toán - BKHN - 5
Lý thuyết tính toán - BKHN - 5
 

Similar to A Neural Probabilistic Language Model_v2

Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Abdullah Khan Zehady
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
c sharada
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
Karol Grzegorczyk
 
Advanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical MarkupAdvanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical Markup
Olga Caprotti
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
Rama Irsheidat
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
milkesa13
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
Hady Elsahar
 
Tensorflowv5.0
Tensorflowv5.0Tensorflowv5.0
Tensorflowv5.0
Sanjib Basak
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
Modern Programming Languages classification Poster
Modern Programming Languages classification PosterModern Programming Languages classification Poster
Modern Programming Languages classification Poster
Saulo Aguiar
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
jonathanG19
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Leonardo Di Donato
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
JOBANPREETSINGH62
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
Bhaskar Mitra
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
Deep Learning Italia
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
ANISH BHANUSHALI
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
RwanEnan
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 

Similar to A Neural Probabilistic Language Model_v2 (20)

Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Advanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical MarkupAdvanced Language Technologies for Mathematical Markup
Advanced Language Technologies for Mathematical Markup
 
A Neural Probabilistic Language Model
A Neural Probabilistic Language ModelA Neural Probabilistic Language Model
A Neural Probabilistic Language Model
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Tensorflowv5.0
Tensorflowv5.0Tensorflowv5.0
Tensorflowv5.0
 
Word embedding
Word embedding Word embedding
Word embedding
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Modern Programming Languages classification Poster
Modern Programming Languages classification PosterModern Programming Languages classification Poster
Modern Programming Languages classification Poster
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 

Recently uploaded

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 

Recently uploaded (20)

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 

A Neural Probabilistic Language Model_v2

  • 1. A Neural Probabilistic Language Model Learning a distributed representation for words 2018.08.09 Soo
  • 2. Contents ● Fundamental Problem of Language Modeling ● Statistical Model of Language ● Neural Probabilistic Language Model ● Result
  • 3. Fundamental Problem of Language Modeling ● Curse of Dimensionality ○ As the number of features or dimensions grows, the amount of data that we need to generalize accurately exponentially Number of free parameters when using discrete spaces to modeling languages Use Continuous Space !!
  • 4. Statistical Model of Language ● The conditional probability of the next word given all the previous ones ● The reason of approximation: ○ Temporally closer words in the word sequence are statistically more dependent → N-gram Models
  • 5. Statistical Model of Language ● N-gram models examples ○ How likely is “University” given “New York”? ○ Count all “New York University” ○ Count all “New York ?”: e.g., “New York State”, “New York City”, “New York Fire”, “New York Police”, “New York Bridges”, … ○ How often “New York University” happens among these?
  • 6. Problems in Statistical Model of Language ● A new combination of n words appears that was not seen in the training corpus ● Solutions ○ back-off trigram models (Katz, 1987) ○ smoothed(or interpolated) trigram model (Jelinek and Mercer, 1980) ● Limits ○ Long-term dependency: no farther than 1 or 2 words ○ no measure of similarity
  • 7. Neural Probabilistic Language Model ● Abstract: ○ Associate with each word in the vocabulary a distributed word feature vector (a real valued vector in ○ express the joint probability function of word sequences in terms of the feature vectors of these words in the sequence ○ learn simultaneously the word feature vectors and the parameters of that probability function
  • 8. Neural Probabilistic Language Model 1. Word Feature Vector ○ associated with a point in a vector space (embed word into a vector space) ○ Dimension of vector m is much more smaller than the vocabulary size ( V ) 2. Probability function ○ product of conditional probabilities → maximization of log-likelihood
  • 9. A sentence is a Sequence : Object: Two Parts
  • 10. Object: 1. Distributed feature vectors (= Embedding Layer) Practically, C is a matrix that a row vector in C represents a feature vectors. Also, C is shared across all the words in context.
  • 11. Object: 2. Probability functions (g) (= Two layer Neural Network) Context vectors → Concatenate → Linear (Affine) → Tanh → Linear(Affine) → Softmax
  • 12. Process: Sizes: - |V|: vocab size - h: hidden size - m: embed size
  • 13. Loss: ● f : Neural Network ● R: Regularization (=weight decay)
  • 14. Parameters: Total number of free parameters: - h: hidden size - m: embed size Sizes: - |V|: vocab size - h: hidden size - m: embed size
  • 15. Test Measurement: Perplexity A measurement of how well a probability distribution or probability model (q) predicts a sample Lower Perplexity means the model q fits better to generate training data. Means that model is less surprised by the test sample.
  • 16. Why geometric averages of inverse probability?
  • 18. References ● A Neural Probabilistic Language Model - Yoshua Bengio, 2003 blog: https://simonjisu.github.io github: https://github.com/simonjisu