2nd Statistical Machine Learning Seminar (Jan. 19th)

56 views
Skip to first unread message

Kenji Fukumizu

unread,
Dec 27, 2010, 10:55:30 AM12/27/10
to ibi...@googlegroups.com
# English translation is attached below:
IBISMLの皆様

第2回統計的機械学習セミナー(主催:統計数理研究所 機械学習NOE)
を以下のように開催いたします.
今回はノンパラメトリックベイズなどで活躍している2名の講演です.
事前登録等は不要ですので,奮ってご参加ください.

日時:1月19日(水)13:30--15:40
場所:統計数理研究所 セミナー室5(3階 D313,D314)
(講演は英語で行います)
講演者: Yee Whye Teh (University College London)
持橋大地(NTT CS研)


世話人:統計数理研究所 福水健次

--
Dear all

We are pleased to announce that the 2nd Statistical Machine Learning Seminar
will be held in the following schedule.

Date: January 19 (Wed) 13:30 - 15:40
Place: Seminar room 5 (3F, D313,314), The Institute of Statistical
Mathematics, http://www.ism.ac.jp/access/index_e.html
Speakers: Yee Whye Teh (University College London),
Daichi Mochihashi (NTT CS Labs)
Language: English

--
*Speaker 1: Yee Whye Teh (University College London)

Title: Hierarchical Bayesian Models of Language and Text

Abstract: In this talk I will present a new approach to modelling
sequence data called the sequence memoizer. As opposed to most other
sequence models, our model does not make any Markovian assumptions.
Instead, we use a hierarchical Bayesian approach which enforces sharing of
statistical strength across the different parts of the model. To make
computations with the model efficient, and to better model the power-law
statistics often observed in sequence data, we use a Bayesian nonparametric
prior called the Pitman-Yor process as building blocks in the hierarchical
model. We show state-of-the-art results on language modelling and text
compression.

This is joint work with Frank Wood, Jan Gasthaus, Cedric Archambeau and
Lancelot James.

Bio: Yee Whye Teh is a Lecturer (equivalent to an assistant professor
in US system) at the Gatsby Computational Neuroscience Unit, UCL. He is
interested in machine learning and Bayesian statistics. His current focus is
on developing Bayesian nonparametric methodologies for unsupervised
learning, computational linguistics, and genetics. Prior to his appointment
he was Lee Kuan Yew Postdoctoral Fellow at the National University of
Singapore and a postdoctoral fellow at University of California at Berkeley.
He obtained his Ph.D. in Computer Science at the University of Toronto in
2003.

--
*Speaker 2: Daichi Mochihashi (NTT CS Labs )

Title:
Unsupervised and Semi-supervised Learning of Nonparametric Bayesian Word
Segmentation

Abstract:
For unsegmented languages such as Japanese and Chinese, word segmentation is
often a first step for natural language processing thus has been an
important problem for a long time. Lately, to deal with non-standard
colloquial texts seen in blogs and twitters, supervised methods so far are
no longer valid and necessitates a novel method to automatically acquire new
words to segment these texts appropriately.

In this talk, I introduce the first nonparametric Bayesian generative model
that can recognize words in an unsupervised fashion, even for completely
unknown language. This model, called nested Pitman-Yor language model
(NPYLM), can infer "words" by both its spellings and other "words", which
are also unknown in advance. MCMC with efficient forward-backward algorithm
is used for inference, enabling the model to be applied to huge actual
texts.

For the second part, I extend the NPYLM to semi-supervised learning using
Conditional Random Fields (CRF). Although NPYLM can be regarded as a kind of
semi-Markov model, naive combination with semi-Markov CRF is prohibitive and
proves to work badly. To cope with this problem, we convert the information
between Markov CRF and semi-Markov NPYLM to yield a consistent combination
of discriminative and generative models. We show the results on segmenting
twitters, speech transcripts, dialects based solely on newspaper supervised
data, as well as the results for standard datasets on Chinese word
segmentation.

* Latter half of the talk is a joint work with Jun Suzuki and
Akinori Fujino (NTT CS Labs).

Bio:
Daichi Mochihashi is a senior Research Associate (called Research
Specialist) of NTT Communication Science Laboratories, Kyoto. He received BS
from The University of Tokyo and PhD from the Nara Institute of Science and
Technology, 1998 and 2005 respectively. His research is primarily focused on
Bayesian methods in natural language processing. After several years at ATR
spoken language communication research laboratories, he has been affiliated
with NTT CS Labs since 2007.


////-------------------------------------
Kenji Fukumizu
The Institute of Statistical Mathematics
10-3 Midoricho, Tachikawa, Tokyo 190-8562 Japan
Email: fuku...@ism.ac.jp
Tel: +81-50-5533-8540
URL: http://www.ism.ac.jp/~fukumizu/
--------------------------------------////

Reply all
Reply to author
Forward
0 new messages