Adverse Event Reports

This server hosts a Linked Data version of adverse events reported to the Federal Drug Administration.


The Adverse Event Reporting System

The AERS dataset is one of the few remaining, large publicly available medical data sets that until now have not been published as Linked Data. An adverse event (AE) is an adverse change in health or side effect while the patient is receiving treatment. A serious adverse event (SAE) is life-threatening and, amongst others, may result in death, requires hospitalisation or prolongation of existing hospitalisation and will result in persistent or significant disability or incapacity. Known chemotherapy-related SAEs in breast cancer (US only) were linked to 22% of hospitalisations. Clearly, from a clinical perspective, serious adverse events are very important: this is where Clinical Decision Support can make a huge difference.


The AERS data files are published on a quarterly basis, as zip files containing dollar separated tables. These zip files are roughly 20MB in size, and available from the FDA website from two separate static webpages. Converting this data is a process with six steps:

  1. scrape the FDA website, download and unzip the data dump;
  2. check integrity of the files, applying fixes if necessary. For instance, some rows contain line breaks in the wrong places, do not properly escape the separator character or span fewer columns than expected.
  3. import the data into a MySQL database;
  4. dump the data to RDF following a D2RQ mapping;
  5. import the data into 4Store (Unfortunately, exposing through D2R Server turned out to be too slow.)
  6. exposing the data through Pubby

This conversion was implemented as a pipeline called through a Python provenance wrapper: PROV-O-Matic, see This wrapper generates provenance information expressed in the PROV-O vocabulary. The AERS-LD dataset covers all AERS reports from the years 2005-2012.

The AERS dataset is uniquely positioned amidst other HCLS datasets, providing opportunities for linking to drug, location, patient and diagnosis related information. Furthermore, reports in AERS are filled in by hand. Linking out to other datasets could help in identity reconciliation (e.g. drug names, marketing names, and chemical substances) as well as detecting misspellings (e.g. in manufacturer names). We specified mappings between the UMLS, Sider, LinkedCT, Drugbank, DBPedia and CTCAE datasets using the SILK link specification language, resulting in over 60K links based only on exact string matches. Using less exact matching on drug names can have unwanted consequences.

Linked Data in HCLS

The fields of health care and life science (HCLS) have traditionally seen a lot of attention from the Semantic Web community, and vice versa: semantic web languages, and their predecessors have proven to be a convenient paradigm for representing biomedical knowledge.

Vocabularies in the HCLS field are highly standardised; computer analysis, and computer-based information exchange are ubiquitous throughout the field (viz. the Humanities). As a result, many (bio)medical databases and terminologies are now published as linked data, taking up about a fourth of the Linked Data cloud. Examples are medical vocabularies such as SnomedCT, MeSH, MedDRA, and the NCI Thesaurus (all part of the Unified Medical Language System (UMLS)), and datasets such as LinkedCT (clinical trials), Sider, Drugbank and RxNorm (drug information), Uniprot (protein sequences), to name but a few.

The AERS-LD dataset covers all AERS reports from the years 2005-2012. Reports of other years are available as separate dumps.

The Data2Semantics project is a consortium of VU University Amsterdam, the University of Amsterdam, Elsevier Publishing, Philips Research and the Data and Networked Services (DANS) of the Netherlands Royal Academy of Science. Data2Semantics is funded under COMMIT.

Stay up to date by following us on GitHub.

Query the AERS-LD repository: input query, set output options and press "Run Query"