Lexvo.org Frequently Asked Questions

What is the Semantic Web? What are URIs?

While much of the Web consists of text intended for human readers, the Semantic Web is an effort to provide information on the Web that machines can easily process in order to accomplish useful things. A URI can be regarded as an ID, an identifier for things on the Web like websites, real-world objects, or even somewhat more abstract entities like words, concepts, and languages. Please refer to Wikipedia for more information.

What does Lexvo mean?

The name Lexvo is derived from the Ancient Greek λεξικόν (lexicon) and the Latin vocabularium (or vocabulary). It is the name of a project that aims to provide lexicon-related services on the Web. The Lexvo.org Linked Data URIs are the first of these services.

Are the URIs Dereferenceable?

Yes, using a normal web browser you can access the URIs and receive a human-readable web page describing an entity (e.g. a word or a language). Instead, clients may also choose to request a machine-readable RDF-form representation of pertinent information about an entity using HTTP content negotiation.

Should I use http://lexvo.org/id/... or http://www.lexvo.org/page/...?

You should use the URIs starting with http://lexvo.org/id/ as identifiers to refer to the language-related entities. The URIs starting with http://www.lexvo.org/page/ instead only refer to web pages that happen to be about the respective entities. If a web browser accesses an URI starting with http://www.lexvo.org/id/, it will automatically be redirected to the corresponding web page.

Why URIs for Words/Terms?

String literals cannot serve as subjects of an RDF triple, so it is not conveniently possible to express knowledge about words using string literals. To express lexical knowledge, several ontologies have instead defined OWL classes that represent words or other terms in a language. However, the URIs for individual terms, i.e. the instances of such classes, were often created on an ad hoc basis when needed. For instance, the W3C draft RDF/OWL Representation of WordNet defined URIs for the words covered by the WordNet lexical database, but not for other words. Lexvo.org addresses this by providing predictable URIs for words in any ISO 639-3 language.

Linking to term URIs is especially useful to establish the meaning of a non-information resource URI more clearly. For, example, we might have an URI such as <http://www.some.org/#Frankfurt> that is supposed to refer to the city of Frankfurt in Germany. However, we should rely on factual data rather than mere appearances to derive this meaning, because it shouldn't matter to us whether the URI is named <http://www.some.org/#Frankfurt> or <http://www.some.org/#City348914>. One way of doing so is to clarify the meaning using a lexicalization relation:
<http://www.some.org/#Frankfurt> <lexvo:lexicalization> <lexvo:term/deu/Frankfurt%20am%20Main>
or
<http://www.some.org/#City348914> <lexvo:lexicalization> <lexvo:term/deu/Frankfurt%20am%20Main>
Now, it is clear in both cases that the URI can only denote entities that are called "Frankfurt am Main" in German.

Why Language-Specific Term URIs?

Different levels of abstractions can be chosen. We made a pragmatic choice to consider two term entities distinct if the strings are different after Unicode NFC normalization, or if the ISO 639-3 codes differ, which is similar to the RDF semantics for literals. Thus we do not distinguish the meanings of polysemous words in a language, e.g. the verb and noun meanings of the English term "call". In contrast, we do consider the Italian term "burro", which means butter, distinct from the Spanish term "burro", which means donkey.

How can I construct Lexvo URIs?

More information can be found on the Technical Details page. We also offer a simple Java API that allows you to create URIs for languages and terms, which is described in further detail on our Getting Started page.

Why not <rdfs:label> or <skos:prefLabel> instead of <lexvo:label>?

<lexvo:label> represents the semantic relation that holds between an entity and terms (words, names, etc.) commonly used to refer to it, e.g. between Albert Einstein and the string "Albert Einstein", or between the concept of books and the French term "livre" (NB: It is deliberately underspecified to apply to real-world entities as well as conceptual entities). RDF triples involving <lexvo:label> describe actual language use.

In contrast, <rdfs:label> is merely an annotation property that is used to assign human-readable resource labels to resources, which can also be identifier strings such as minCardinality rather than genuine words or names used by a language community.

The SKOS label properties force us to make normative judgments about which label is preferred for a given entity. This makes sense within a single authoritative thesaurus, but is not appropriate for an open environment where we merely wish to describe which terms are commonly used to refer to something. This is why <lexvo:label> is defined to be a more generic super-property of <skos:prefLabel> and <skos:altLabel>.

Should I use the LOC's ISO 639-2 URIs instead?

One advantage of using those URIs is that they are maintained by the Library of Congress. However, since there is a natural and simple well-defined scheme for transforming authoritative ISO 639-3 standard codes to Lexvo.org URIs and vice versa, Lexvo.org's language URIs are just as stable and will not become meaningless at any point in the future. Additionally, there are several other issues to consider. First of all, the code set that the LOC URIs are based on is orders of magnitude smaller than ISO 639-3 and for example lacks an adequate code for Cantonese, which is spoken by over 60 million speakers.
More importantly, the LOC's URIs do not describe languages per se but rather describe code-mediated conceptualizations of languages. This implies, for instance, that the French language (<http://lexvo.org/id/iso639-3/fra>) has two different counterparts at the LOC, <http://id.loc.gov/vocabulary/iso639-2/fra> and <http://id.loc.gov/vocabulary/iso639-2/fre>, which each have slightly different properties.
Finally, connecting your data to Lexvo.org's information is likely to be more useful in practical applications. It offers information about the languages themselves, e.g. where they are spoken, while the LOC mostly provides information about the codes, e.g. when the codes were created and updated and what kind of code they are.
In practice, you can also use both codes simultaneously in your data. However, you need to be very careful to make sure that you are asserting that a publication is written in French rather than in some concept of French created on January, 1, 1970 in the United States.

Languages, etc. not covered by Lexvo.org

If you need a URI for a language variety not covered by Lexvo.org, e.g. Old English, one option is to use the corresponding DBpedia URI, if the variety has its own Wikipedia article.

Return to main page


Lexvo.org 2008-2016 Gerard de Melo.   Contact   Data Sources   Legal Information / Imprint