Common Language Resources Infrastructure
- Universidade de Lisboa - Faculdade de Ciências(líder)
- Universidade de Évora(parceiro)
- Universidade de Lisboa - Faculdade de Letras(parceiro)
CLARIN LP is the national node of the European research infrastructure named Common Language Resources
The CLARIN European Research Infrastructure Consortium (CLARIN ERIC) is a transnational distributed research
infrastructure whose construction started in March 2012 and at present has 17 states as members, including Portugal,
whose acession took place in 2014.
The present construction phase was preceded by a 3.5 year preparation phase, supported by an FP7 Project
(2008-2011). Previous to this preparatory project, there was an initial proposal in 2006, submited to a competitive call
of the European Strategy Forum for Research Infrastructures (ESFRI), which was evaluated positively, and CLARIN
was one of the 34 research infrastructures right in the first version of the ESFRI Roadmap.
CLARIN makes resources and technology available and useful to scholars and experts from all disciplines whose
topics of inquiry, development or innovation concerns or are related to human language, with special relevance to the
humanities and social sciences, and to the cognitive and computation sciences.
It will be serving researchers when one needs to use a processing tool (e.g. terminology extractor; concordancer; etc.),
to get a chunk of data (e.g. utterances of sign language from deaf children in video records; the words for concepts
in the subontology of Organizations; etc.), or to use a fully equipped virtual workbench (e.g. to support field work to
document an endangered language; to do research on statistical machine translation; etc.)
It grants access to passive and active research materials and aids. These include datasets (e.g. linguistically
interpreted corpora; terminology banks; EEG recordings from neurolinguistic experiments; etc.), research specific
applications (e.g. lemma frequency extractors; treebanking annotators; etc.), or language processing tools (e.g. POS
taggers; deep linguistic processing grammars; etc.) It makes it possible for these assets to be combined, merged or
pipelined, in what distinctively makes of it much more than a mere repository of data.
The national node ensures the access to the CLARIN ERIC trust domain. It is necessary to provide access by
researchers of Portuguese speaking teams to the international RI, including its global repository of datasets and,
above all, the operation of language processing aids and related research supportive webservices.
Concomitantly, and crucially, the national node will grant the access to datasets, processing tools and services that
are specific of the Portuguese language. This node is thus a sine qua non condition for advanced world research
involving the Portuguese language to be supported by the research infrastructure.
Objectives, activities and expected/achieved results
CLARIN serves all experts whose topics of research, development or innovation concern or are related to language
and to the handling of language data:
- in all kinds of modalities: spoken, written, multimodal, etc.
- in all kinds of representations: audio, text, video, neuro-activity records, etc.
- and in all kinds of roles: symbolic object, instrument of communication, reflex of mental activity, cognitive skill to be
enhanced in education, skill to be trained in second language acquisition, carrier of content and knowledge, element
of cultural identity, natural way of interaction with appliances and artificial agents, etc.
The ultimate scientific aim of CLARIN is thus to foster a major leap forward in terms of cutting edge research that leads
to ground breaking results in the scientific study of human language and in the technological progress and economic
development driven by it.
> Infrastructural services
To pursue its mission and accomplish its goals, CLARIN is designed to ensure services and resources to the research,
educational and industry sectors that cannot be made available or be accessed by individual interested parties in
isolation. They are:
- language processing services, which include: online services to be used by humans; applications specifically tailored
to support research tasks; and webservices to support machine-machine interaction; etc.
- distribution, sharing and reutilization of resources, which include: the repository and distribution of data collections,
processing tools and applications; their inventory and documentation; the achieving and digital preservation of
scientific and language heritage; etc.
- technical helpdesk, which include: technical support for academic users; advice on licensing of resources; advancedtraining; etc.
- consulting for non academic entities.
Given its volume, diversity, interoperability and innovative character, these services not only are NOT redundant with
concomitant activities, organizations, centers, units, departments or teams as they represent a key asset to assist
them, to be explored by them and to leverage the quality and the volume of their performance.
By its unique multi-disciplinary strength, CLARIN grants access to language-related materials and computational
processing tools that, given their previous level of fragmentation, lack of interoperability, or even its sheer volume,
would otherwise not be widely available to be used in research and to support innovation and education ? and actually
run the risk of being lost due to a lack of sustainable curation.
It will grant access also to research services and specialized aids that will support researchers in designing and coping
with instrumental tasks for which specific technical skills are needed and would otherwise not be available.