Random Post: ADS use case
RSS .92| RSS 2.0| ATOM 0.3
  • Home
  • About
  • Team
  •  

    Linked Data for places – any advice?

    January 6th, 2011

    We’d really benefit from advice about what Linked Data namespaces to use to describe places and the relationships between them. We want to re-use as much of others’ work as possible, and use vocabularies which are likely to be well and widely understood.

    Here’s a sample of a “vanilla” rendering of a record for a place-name in Cheshire as extracted from the English Place Name Survey – see this as a rough sketch.

    <RDF>
    <chalice:Place rdf:about=”/place/cheshire/prestbury/bosley/bosley”>
    <rdfs:isDefinedBy>/doc/cheshire/prestbury/bosley/bosley
    </rdfs:isDefinedBy>
    <rdfs:label>Bosley</rdfs:label>
    <chalice:parish rdf:resource=”/place/cheshire/prestbury/bosley”/>
    <chalice:parent rdf:resource=”/place/cheshire/prestbury/bosley”/>
    <chalice:parishname>Bosley</chalice:parishname>
    <chalice:level>primary-sub-township</chalice:level>
    <georss:point>53.1862392425537 -2.12721741199493</georss:point>
    <owl:sameAs rdf:resource=”http://data.ordnancesurvey.co.uk/doc/50kGazetteer/28360″/>
    </chalice:Place>
    </rdf:RDF>

    GeoNames

    We could re-use as much as we can of the geonames ontology. It defines a gn:Feature to indicate that a thing is a place, and gn:parentFeature to indicate that one place contains another.

    Ordnance Survey

    Ordnance Survey publish some geographic ontologies: there are some within data.ordnancesurvey.co.uk, and there’s some older work including a vocabulary for mereological (i.e. containment) relations includes isPartOf and hasPart. But the status of this vocabulary is unclear – is its use still advised?

    The Administrative Geography ontology defines a ‘parish‘ relation – this is the inverse of how we’re currently using ‘parish’. (i.e. Prestbury contains Bosley) (And our concepts of historic parish and sub-parish are terrifically vague…)

    For place-names found in the 1:50K gazetteer the OS use the NamedPlace class – but it feels odd to be re-using a vocabulary explicitly designed for the 50K gazetteer.

    Or…

    Are there other wide-spread Linked Data vocabularies for places and their names which we could be re-using? Are there other ways in which we could improve the modelling? Comments and pointers to others’ work would be greatly appreciated.


    Reflections on the second Chalice scrum

    January 6th, 2011

    We had a second two-week Scrum session on code for the Chalice project. This was a followup to the first Chalice scrum during which we made solid progress.

    During the second Scrum the team ran into some blocks and progress slowed. The following is quite a soul-searching post, in accordance with the project documentation instructions: “don’t forget to post the FAIL(s) as well: telling people where things went wrong so they don’t repeat mistakes is priceless for a thriving community.”

    Our core problem was the relative inflexibility of the relational database backend. We’d chosen to use an RDBMS rather than an RDF triplestore mainly for the benefits of code-reuse and familiarity, as this enabled us to repurpose code from a couple of similar EDINA projects, Unlock and Addressing History.

    However, when the time came to revise the model based on updated data extracted from EPNS volumes, this created a chain of dependencies – updates to the data model, then the API, then the prototype visualisation – progress slowed, and not much changed in the course of the second sprint.

    A second problem was lack of really clearly defined use cases, especially for a visual interface to the Chalice data. Here we have a bit of a chicken-and-egg situation; the work exploring how different archive projects can re-use the Chalice data to enhance their collections, is still going on. This is something which we have more emphasis on during the latter part of the project.

    So on the one hand there’s a need for a working prototype to be able to integrate Chalice data with other resources; and on the other, a need to know how those resources will re-use the Chalice data to inform the prototype.

    So what would we do differently if we did it again?

    • More of a design phase before the Scrum proper starts – with time to experiment with different data storage backends
    • More work developing detailed use cases before software development starts
    • More active collaboration between people talking to end users and people developing the backend (made more difficult because the project partners are distributed in space)

    Below are some detailed comments from two of the Scrum team members, Ross and Murray.

    Ross: I found Scrum useful, efficient, great for noticing both what others are doing and when your heading down the wrong path and identifying when you need further meetings, as was the case a few times early in the process. The whiteboard idea developed later on was also very useful. I don’t think the bottlenecks where anything to do with the use of Scrum, just in the amount of information and quality of data we had available to us, maybe this is due partially to the absence of requirements gathering in Scrum.

    The data we received had to be reverse engineered to some respect. As well as figuring out what everything in the given format was for (such as regnal dates, alternative names, contained places and their location relative to parent) and what parts where important to us (such as which of the many date formats we were going to store i.e. start, end and/or approximations) we also had no direct control over it.

    In order for the database, interface and API to work we had to decide on a structure quickly and get data in the database meaning learning how to install and operate a triple store (the recommend method) or spend time figuring out how to get hibernate to work with the decided
    structure (a more adaptable database access technology) would have delayed everything so a trade off was made to manually write code to
    parse the data from XML and enter it into a familiar relational database which caused us more problems later on. One of these was that the data was to continue to change on every generation; elements being added and removed or completely changed meant changing the parsing, then the domain objects, then the database and lastly the database insertion code.

    Lack of use cases: From the start we were developing an app without knowing what it should look like or how it should function. We were unsure as to what data we should or would need to store and how much control users of the service would have over the data in the database. We were unsure how to query the database and display API request responses so as to best fit the
    needs of the intended users in an efficient, useful way. We are slightly more clear on this but more information on how the product will be used would be greatly helpful.

    And as for future development… If we are sticking with the relational database model I definitely think it’s wise to get rid of all the database reading/writing code in favour of a hibernate solution, this would be tricky with our database structure however but more adaptable and symmetrical; so that changes to the input method are also made to the output and only one change needs
    to be made. Some sort of XML-POJO relational tool may also be useful
    although would make new dataset importing more complex (perhaps using
    xslt) to further improve adaptability.
    As well as that, some more specific use cases mentioning inputs and
    required outputs would be very useful.

    Murray: My comment, would be that we possibly should have worked on a hibernate
    ORM first, before creating the database. As soon as we had natural keys,
    triggers and stored procs in the database, it became too cumbersome to
    reverse engineer them.

    If we had created a ORM mapping first we could automatically generate
    the db schema from that, rather than the other way round.
    I presume we could write the searches even the spacial ones in hibernate
    rather than stored procs.
    Then it would be easier to cope will all the shifts in the xml
    structure. Propagating to changes through the tiers would be case of
    regenerating db and domain objects from the mappings rather than by hand.

    The generated domain objects could be reused across the dataloading, api
    and search. The default lazy loading in hibernate would have been good
    enough to deal with the hierarchical nature of the data to a
    indiscriminate depth.