HLTI
International Master in Human Language Technology and Interfaces   Courses and Exams
Courses and Exams
Introduction

The Master provides a set of theorical courses (20 educational credits) and a set of advanced courses about HLTI technologies and applications (20 credits). Students are also involved in laboratory activities (20 credits) and need to pass a final exam (5 credits). The total amount of credits is 65.

Courses:

Course Credits
Speech Processing 5
Machine Learning for Natural Language Processing 5
Text Processing I 5
Introduction to Linguistics 5
Spoken Dialog Systems 4
Text Processing II 4
Human Computer Interaction 4
Language Resources and Ontologies 4
Multilingual Technology 4
Laboratory of Human Language Technology and Interfaces 20


Prerequisites


I Semester

  • Speech Processing (5 credits): Speech is one of the most important means of human-human and human-machine communication. The course is aimed at introducing the students to the theoretic and algorithmic aspects of an automatic speech recognizer. The content of the course is divided into four parts. The first part covers the analysis of the speech signal, from the acoustic theory of speech production and phonetics to the extraction of appropriate spectral acoustic features. The second part introduces /n/-gram models, that are the most widely used statistical models for sequences of words. The third part describes the most successful statistical models used for modeling the acoustic realization of words, hidden Markov models. Techniques are presented for the reliable estimation of both acoustic models and language models. In the fourth part, search algorithms for their combined exploitation in the decoding of feature sequences are introduced. Lectures are coupled with lab sessions that provide hands-on teaching of the basic tools to analyze speech signals, extract features, and train models for an automatic speech recognizer.
  • Machine Learning for Natural Language Processing (5 credits): This course aims at providing an introduction to the basic principles, techniques, and applications of Machine Learning to Natural Language Processing and to a large part of Information Retrieval. Topics include: basic concepts (Vector Spaces, Matrices and Probability Theory); introduction to statistical Machine Learning: Concept Learning, Decision Trees, Naive Bayes and Expectation Maximization; introduction to PAC Learning: formal PAC definition, example of PAC learnable functions, VC-dimension; Neural Networks: Perceptrons, Support Vector Machines, kernel methods and kernels for structured data; Unsupervised Learning: feature selection, clustering, k-means, LSA and pruning; performance measurements: empirical error estimation and hypothesis testing, n-fold cross validation and other computer-intensive methods, computational complexity of learning and testing phases. Generalization of the above techniques to typical natural language learning tasks: word sense disambiguation, part of speech tagging and named entity recognition. Advanced learning approaches for texts: question categorization via syntactic parsing and tree kernels.
  • Text Processing I (5 credits): This course introduces the fundamental techniques to acquire and process textual data, transforming raw text into structured semantic information. Starting from the acquisition and basic processing of large textual data-bases (corpora), we cover part-of-speech tagging, lemmatization, syntactic parsing, measuring statistical association between terms, word sense disambiguation and computational/statistical methods to extract semantic information form text.
  • Introduction to Linguistics (5 credits): This course provides an introduction to foundational issues in theoretical linguistics, as well as an overview of some current research area in language sciences. It require no prior knowledge of linguistics. Relevant topics of the course include: phonetics and phonology; morphology and lexical structure; syntax; sentential semantics; discourse structure and anaphora resolution. The examples will focus on Italian and English.


II Semester

  • Spoken Dialog Systems (4 credits): Human-machine communication is based on spoken dialog systems (SDS). This class will cover the algorithms, technology and architecture of SDS. Topics that will be covered are the following. Spoken language understanding, (robust) Natural Language Parsing. Semantic processing of text and spoken documents. Dialog and User Modelling. Dialog corpora Annotation. Stochastic models of dialog. Affective User modelling, Voice User interfaces. Language and Standards for Dialog descriptions. There will lab sessions where students will experiment with tools and platform for building Spoken Dialog Systems.
  • Text Processing II (4 credits): The course builds on top of Text Processing (basic) and proposes an application-oriented perspective on Text Processing techniques. It is intended to develop both practical competences to be exploited in applications settings and a preparation for active research in advanced areas of HLT. Relevant topics of the course include: Text Mining, a broad area of applications which includes techniques (e.g. text clustering, text categorisation) for the automatic organisation of large repositories of documents; Information Extraction from text, where the goal is to individuate specific pieces of information in texts and collect them in a structured format (e.g. a database) so that they are easily accessible; techniques for Information Retrieval (such as text segmentation) and for Question Answering, where users pose query and the system has to retrieve either documents or precise answers satisfying the query; automatic Summarization, where the content of large documents is made available in a concise and readable form. Practical labs with project-oriented activities will be activated and offered to students.
  • Human Computer Interaction (4 credits): This course addresses the fundamentals of Human-Computer Interaction with emphasis on interaction design for multimodal systems. The main part of the course will focus on User-Centred Design, providing concepts and hands-on practise on techniques for collecting user requirements, lo-fi and hi-fi prototyping, formative and summative evaluation. Additional topics covered include basic notions of cognitive ergonomics, pyschometric theory and a crash course on a rapid prototyping tools (like, for example, Macromedia Flash). During the course, students will have the opportunty to work in real projects, designing and evaluating prototypes of multimodal systems.
  • Language Resources and Ontologies (4 credits): The performance of HLT systems crucially depends on the quality of the resources used to train them (particularly annotated corpora) and that such systems can access at runtime (lexical resources such as machine-readable dictionaries and WordNet, and domain ontologies). This course will introduce students to the available resources and train them to use them and create new ones. Topics covered in the course include: creating and using annotated corpora machine readable dictionaries and lexica, focusing in particular on WordNet domain ontologies, covering theoretical background, general-purpose ontologies such as DOLCE, and domain-specific resources.
  • Multilingual Technology (4 credits): This course will address processing of linguistic content across different languages. Backbone of the course will be the theory and methods of statistical machine translation, which represents the current state-of-the-art in this field. In particular, both translation from spoken and written language will be considered. Additional topics covered by the course will be cross-language information retrieval and cross-language information extraction. During the course, students will have the opportunity to perform experiments with state-of-the-art translation software on some real-life tasks, such as the translation of news from Chinese and Arabic to English, or the translation of European Parliament speeches from/to all its official languages.


Thesis project

  • Laboratory of Human Language Technology and Interfaces (20 credits): The laboratory activities are focused at the realization of interfaces based on language technologies. These activities will be carried out at the Master’s Lab or in Industries.