This vignette shows how to use the eurlex R package to make SPARQL queries to retrieve data on European Union law.

Introduction

Dozens of political scientists and legal scholars use data on European Union laws in their research. The provenance of these data is rarely discussed. More often than not, researchers resort to the quick and dirty technique of scraping entire html pages from eur-lex.europa.eu. This is not the optimal, nor preferred (from the perspective of the server host) approach of retrieving data, however, especially as the Publication Office of the European Union, the public body behind Eur-Lex, operates several dedicated APIs for automated retrieval of its data.

The allure of web scraping is completely understandable. Not only is it easier to download data that can be readily seen in a user-friendly manner through a browser, using the dedicated APIs requires technical knowledge of semantic web and Client URL technologies, which is not necessarily widespread among researchers. And why go through the pain of learning how to compile SPARQL queries when it is much easier to simply download the web page?

The eurlex R package attempts to significantly reduce the overhead associated with using the SPARQL and REST APIs made available by the EU Publication Office. Although at present it does not offer access to the same array of information as comprehensive web scraping might, the package provides simpler, more efficient and transparent access to data on European Union law. This vignette gives a quick guide to the package and an even quicker introduction to the Eur-Lex dataverse.

The eurlex package

The eurlex package currently envisions the typical use-case to consist of getting bulk information about EU law and policy into R as fast as possible. The package contains three core functions to achieve that objective: elx_make_query() to create SPARQL queries based on user input; elx_run_query() to execute the pre-made or any other manually input query; and elx_fetch_data() to fire GET requests for certain metadata to the REST API.

The package also contains largely self-explanatory functions for retrieving data on EU court cases (elx_curia_list()) and Council votes (elx_council_votes(), currently dysfunctional) from outside Eur-Lex. More advanced users might be interested in downloading and custom-parsing XML notices with elx_download_xml().

elx_make_query(): Generate SPARQL queries

The function elx_make_query takes as its first argument the type of resource to be retrieved from the semantic database that powers Eur-Lex (and other publications) called Cellar.

library(eurlex)
library(dplyr)

query_dir <- elx_make_query(resource_type = "directive")

Currently, it is possible to choose from among a host of resource types, including directives, regulations and even case law (see function description for the full list). It is also possible to manually specify a resource type from the eligible list.1

The choice of resource type is then reflected in the SPARQL query generated by the function:

query_dir %>% 
  cat()
#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
#>   select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/DIR>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_DEL>) 
#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} FILTER not exists{?work cdm:do_not_index "true"^^<http://www.w3.org/2001/XMLSchema#boolean>}. }

elx_make_query(resource_type = "caselaw") %>% 
  cat()
#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
#>   select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/JUDG>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/ORDER>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/OPIN_JUR>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/THIRDPARTY_PROCEED>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/GARNISHEE_ORDER>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/RULING>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/JUDG_EXTRACT>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/INFO_JUDICIAL>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/VIEW_AG>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/OPIN_AG>) 
#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} FILTER not exists{?work cdm:do_not_index "true"^^<http://www.w3.org/2001/XMLSchema#boolean>}. }

elx_make_query(resource_type = "manual", manual_type = "SWD") %>% 
  cat()
#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
#>   select distinct ?work ?type ?celex where{ ?work cdm:work_has_resource-type ?type.FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/SWD>) 
#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} FILTER not exists{?work cdm:do_not_index "true"^^<http://www.w3.org/2001/XMLSchema#boolean>}. }

There are various ways of querying the same information in the Cellar database due to the existence of several overlapping classes and identifiers describing the same resources. The queries generated by the function should offer a reliable way of obtaining exhaustive results, as they have been validated by the helpdesk of the Publication Office. At the same time, it is always possible there will be issues either on the query or the database side; please report any you encounter through Github.

The other arguments in elx_make_query() relate to additional metadata to be returned. The results include by default the CELEX number and exclude corrigenda (corrections of errors in legislation). Other data needs to be opted into. Make sure to select ones that are logically compatible (e.g. case law does not have a legal basis). More options should be added in the future.

Note that availability of data for each variable might have an impact on the results. The data frame returned by the query might be shrunken to the size of the variable with most missing data. It is recommended to always compare results from a desired query to a minimal query requesting only celex ids.

elx_make_query(resource_type = "directive", include_date = TRUE, include_force = TRUE) %>% 
  cat()
#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
#>   select distinct ?work ?type ?celex ?date ?force where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/DIR>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_IMPL>||
#>   ?type=<http://publications.europa.eu/resource/authority/resource-type/DIR_DEL>) 
#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} OPTIONAL{?work cdm:work_date_document ?date.} OPTIONAL{?work cdm:resource_legal_in-force ?force.} FILTER not exists{?work cdm:do_not_index "true"^^<http://www.w3.org/2001/XMLSchema#boolean>}. }

# minimal query: elx_make_query(resource_type = "directive")

elx_make_query(resource_type = "recommendation", include_date = TRUE, include_lbs = TRUE) %>% 
  cat()
#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
#>   select distinct ?work ?type ?celex ?date ?lbs ?lbcelex ?lbsuffix where{ ?work cdm:work_has_resource-type ?type. FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/RECO>||
#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DEC>||
#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DIR>||
#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_OPIN>||
#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_RES>||
#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_REG>||
#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_RECO>||
#>                    ?type=<http://publications.europa.eu/resource/authority/resource-type/RECO_DRAFT>) 
#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} OPTIONAL{?work cdm:work_date_document ?date.} OPTIONAL{?work cdm:resource_legal_based_on_resource_legal ?lbs.
#>     ?lbs cdm:resource_legal_id_celex ?lbcelex.
#>     OPTIONAL{?bn owl:annotatedSource ?work.
#>     ?bn owl:annotatedProperty <http://publications.europa.eu/ontology/cdm#resource_legal_based_on_resource_legal>.
#>     ?bn owl:annotatedTarget ?lbs.
#>     ?bn annot:comment_on_legal_basis ?lbsuffix}} FILTER not exists{?work cdm:do_not_index "true"^^<http://www.w3.org/2001/XMLSchema#boolean>}. }

# minimal query: elx_make_query(resource_type = "recommendation")

You can also decide to not specify any resource types, in which case all types of documents will be returned. As there are over a million documents with a CELEX identifier, this is likely not efficient for a majority of users. But since version 0.3.5 it is possible to request documents belonging to a particular “sector” or directory code.

# request documents from directory 18 ("Common Foreign and Security Policy")
# and sector 3 ("Legal acts")

elx_make_query(resource_type = "any",
               directory = "18",
               sector = 3) %>% 
  cat()
#> PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
#>   PREFIX annot: <http://publications.europa.eu/ontology/annotation#>
#>   PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
#>   PREFIX dc:<http://purl.org/dc/elements/1.1/>
#>   PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
#>   PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
#>   PREFIX owl:<http://www.w3.org/2002/07/owl#>
#>   select distinct ?work ?type ?celex where{
#>     VALUES (?value)
#>     { (<http://publications.europa.eu/resource/authority/fd_555/18>)
#>       (<http://publications.europa.eu/resource/authority/dir-eu-legal-act/18>)
#>     }
#>     {?work cdm:resource_legal_is_about_concept_directory-code ?value.
#>     }
#>     UNION
#>     {?work cdm:resource_legal_is_about_concept_directory-code ?directory.
#>       ?value skos:narrower+ ?directory.
#>     }
#>     
#>     ?work cdm:resource_legal_id_sector ?sector.
#>     FILTER(str(?sector)='3')
#>      
#>  FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>} OPTIONAL{?work cdm:resource_legal_id_celex ?celex.} FILTER not exists{?work cdm:do_not_index "true"^^<http://www.w3.org/2001/XMLSchema#boolean>}. }

  1. Note, however, that not all resource types will work properly with the pre-specified query.↩︎