Hello! I am a software developer based in Brisbane, Australia. I use technologies like the Semantic Web and Linked Data to solve data interoperability problems for Australian research data.

My current interests are:

  • solving complex data interoperability problems with the Semantic Web
  • modelling with OWL ontologies in RDF
    • storing data in triplestores
    • writing efficient SPARQL queries for data retrieval
    • redistributing data as Linked Data
  • Big Data
  • Apache Spark

I'm currently reading:

Education

Griffith University

Bachelor of Computer Science
Major in Software Development
Year 2018

SAE Institute

Bachelor of Interactive Media
Major in Games Design
Year 2014

Professional work experience

Ecological Data Integration

TERN (2019)

I am a software developer in TERN's Data Services and Analytics team.

I created the ETL framework for TERN's plot-based data. It uses multiprocessing to transform data received from TERN's data providers into RDF, aligned with TERN's Plot ontology.

I also created TERN's controlled vocabulary management infrastructure. The system uses part of the Australian National Data Service (ANDS) vocabulary infrastructure to edit vocabularies in PoolParty and publish to the Research Vocabularies Australia (RVA) portal.

The controlled vocabularies were harvested from the RVA portal into a vocabulary register which I created called VocView. The purpose of VocView was to redistribute the vocabulary online as Linked Data through an API as well as provide a human-readable custom-branded view.

See an example of VocView expressing CORVEG vocabularies here.
Slides: https://docs.google.com/presentation/d/16LJzvugRt9aZBm7fxlpwOlNvXa2lQrAJkGGt0HzD8ho/edit?ts=5d491949#slide=id.g5bbfc6604f_0_1289

CKAN

CSIRO Land and Water (2018)

The Comprehensive Knowledge Archive Network (CKAN) is an open-source data catalogue for data management and discovery. It is used world-wide by research organisations and government agencies.

CSIRO was working with the Queensland Government's Geological Survey of Queensland (GSQ) to help them migrate their old data warehouse.

CKAN was the chosen data catalogue to list GSQ's boreholes, seismic, and geochemistry data. The CKAN scheming extension was used to provide custom schemas tailored for GSQ's specialised datasets.

A custom theme of CKAN was also created with Queensland Government's branding. Further functionality like exposing the CKAN datasets with RDF (DCAT) metadata was also created and mapped to the specialised GSQ datasets.

Location Integration Capability

CSIRO Land and Water (2018)

I was part of the technical infrastructure team in constructing Australia's first distributed, large-scaled, linked data knowledge graph, to solve data interoperability issues across whole of government.

A set of core location-based datasets, the Geocoded National Addresse File (G-NAF), Australian Statistical Geography Standard (ASGS), and the Australian Hydrological Geospatial Fabric (Geofabric) were first transformed from tabular data to RDF, and subsequently redistributed as Linked Data online.

The three primary location-based datasets acted as the spine of the knowledge graph. The idea was that as long as a new dataset had a relationship (edge of a graph) to one of the primary datasets, then cross-querying across the whole knowledge graph was possible. The new edges were formed by defining relationships between the individual datasets with linksets.

During my time working on this project, I also helped recreate the Persistent Identifier Service. This service is now managed by the Australian Linked Data Working Group (AGLDWG).

LODE 2

CSIRO Land and Water (2018)

OWL ontologies are created using the RDF data model and often serialised as one of the following formats, text/turtle, application/rdf+xml, or application/n-triples. These formats are generally not very human-readable as they are designed for computers to understand. This often creates a technical barrier for the ontologist to communicate their model effectively to their users.

LODE is a tool to parse ontologies and output them as human-readable web documents.

Though LODE provided a free online service for document generation, the server which it was often down and out of action. Efforts of deploying an instance of LODE was also out of luck as the codebase did not work out of the box.

I was tasked with fixing the issues impeding the deployment of LODE as well as improving some of the functionality of LODE, such as embedding WebVOWL to display a visual graph representation of an ontology.

See LODE 2 online in use by the AGLDWG.