Mona Diab

CCLS, Columbia Univ

Automatic Processing of Arabic(s)

Spoken by over 300m people, Arabic is considered one of the languages of significant importance for NLP -- in particular for Machine Translation and Multilingual processing. In this talk, I will explore the depths and breadth of the challenges that Arabic poses to NLP, due to dearth of resources, high variability, lack of writing standards for dialects, and lack of sufficient understanding of the dialectal phenomena.

Mona Diab is a Research Scientist at the Center for Computational Learning Systems (CCLS) and an Adjunct Associate Professor in the Computer Science Department at Columbia University. Mona has been working on models and methods that incorporate computational lexical semantics in NLP applications such MT, IE and IR. She worked on WSD, Semantic Role Labeling, Social Media and Social Network Analysis. Mona adopts a multilingual perspective on the problems she addresses. One of her main areas of focus has been on Arabic Natural Language Processing. She built one of the most widely used software tools for Arabic Processing, AMIRA. She is the cofounder of the Columbia Arabic Dialect Modeling Group (CADIM) for the processing of Arabic and its dialects. CADIM is considered a reference point on Arabic processing in the USA. Before joining Columbia in 2005, Mona did her postdoctoral work under the supervision of Daniel Jurafsky at Stanford University, she was part of the Stanford NLP group. Mona received her PhD in Computational Linguistics from the University of Maryland College Park working with Philip Resnik where she focused on WSD within a Multilingual Framework.

Back to symposium main page