Data Policy Questionnaire Results


Key objectives of the data policies

The questionnaire asked respondents to list the key objectives of their data policies, highlighting which of these were the most important and which were the most time consuming. Key points that respondents made;

  • Linked Data needed in human readable and machine readable format to enable both managers and developers to work with it. “To use the linked data API to make the data accessible as html” is important “as it delivers the data in a human readable as well as machine readable format and this is required for delivering the work to users (including senior management) in a format they can work with rather than limiting it to developers.”
  • Cross domain standards important as “The lack of recognised standards results in extensive manual transformations.
  • Releasing data under a permissive license such as CC0 make it hard to track the data. Not all users acknowledge the source of the data. This makes it hard to accurately capture usage figures and harder to identify innovative examples of use and new use cases.
  • Ownership and origin, authenticity and quality metrics (and thereby value) of data all need to be embedded in the data and metadata to identify if data sets are being used legitimately according to the licensing framework intended by the data author/owner/custodian/publisher. This is also needed to effectively maintain, update and distribute data, and enable users to gain value from the data.
  • Need open standards for naming and metadata to enable plug and play interoperability and automatic discovery, and to avoid extensive manual transformations
  • Metadata standardisation through ontology and linked data initiatives.”Releasing data from the constrainst of standards and policies and embracing incentives for standardisation is the the most compelling solution for a global company. Linked Data and Ontologies provide the mechanism and methodology for providing a community of loosely linked data that is effective, efficient and nimble.
  • Building a metadata framework requires business buy-in, Consensus building and educating are time consuming.
  • Needs ways of defining and dealing with uncertainty, including Geo-uncertainty

Standards, regulations and other influences

All but one of the questionnaire recipients stated that they were subject to external influences on their data policies, both industry based and technology, Government or European based.

INSPIRE, Stats Act, FOI, OGC, ISO, BS and proprietary technology standards were all mentioned. Data Privacy was associated the most with Legal Regulations. Standards were most often applied to data access, data quality and data format policies. Data format policies were also influenced by best practice guidelines and contractual obligations.


Data access policies

This diagram suggests that most aspects of data policy could be helped to one degree or another by software tools and services, especially when it comes to the data limitations in terms of times available, as well as other limitations on the availability of data.

Can you think of anyway that software tools or services might be useful for the implementation of your Access Policy?

  • What is described in the table above are ISO19115 metadata elements relating to data products. For us, the key benefit to adopting linked data is moving away from a product orientated view of the world to a data orientated view. The tools or services described above will already be available should we decide to publish our existing product metadata as RDF but our current model of the RDF data is that we are joining up the properties of data from several products into a single resource that may not be applicable to the options above (ie different properties from different products describing the same ‘thing’ may have different accessibility criteria).
  • An easy to use centralised hosting platform for mounting and describing data sets from organisations in standardised ways for download or access that would also capture usage information etc would be useful to lower barriers for new implementors.
  • (A publishing company) “uses data policies as an agent of change. Access is self determined: not everything is worthy of access. There are several curatorial levels on what combination of data should be accessible. As a content provider, we are obligated to keep very tight controls in order to pay creators and authors for their efforts. Does this include data? Yes. Content can be aggregated in multiple ways, tracking these instantiations is our current challenge.

Data quality policies

There is not much to say about the question responses relating to the measures of data quality. Most measures of data quality were seen as important. No one measure stood out as being more significant than another apart from ‘Compliance of your data against approved standards’ which was seen as important by five out of the six respondents.

If only I had a data quality tool or service that…?

  • Whether any tool could exist that would adequately allow a user to apply data quality filters that they understood (and which were intended in the same way by the data creator) to a data search or quality assurance process seems to me be unlikely; however, metadata should always be available that allows the dedicated user to drill down into what they are using (though evidence suggests that this is rarely the case when third parties start using data in ways for which it was not intended!)
  • …”defined an open standard for interoperability. I am aware of the  OGC SWE models and these go someway to addressing the metadata requirements.
  • While pockets of businesses may find use of a data quality tool, the real motivator is what compliance to policies would achieve. Demonstrations of data analysis results that could only be achieved by complying to policies would be far more powerful.

Maintenance and archiving

It appears that there are a variety data update and refresh regimes in operation between and within organisations. The only suggestion that the respondents to the questionnaire have, regarding tools to help with modelling and managing data updates and refreshes, is again based in metadata and the need for data on the life cycle and history of a feature to be included as data with the feature. As a data issue, this is not within the scope of the RAGLD project, but tools to match data with similar update regimes and from the same temporal space might be.

Can you imagine a software tool or service that could help users of your data make the most of the way that you model and manage data updates?

  • Metadata and metadata access tools could, perhaps should, provide the means by which, where the license allows, the history of individual features within a data set (arguably extensible individual raster tiles or even pixels) can be extracted.

Data sharing models

The questions on data terms of use and re-use employed by the questionnaire respondents and their data policies returned a couple of points that the RAGLD project might want to consider when thinking of the design of linked data tools.

  • Data sources such as post code data are core to any address based data. In examples such as this, the resulting terms of use have to be restricted to the most rigid pricing and licensing model of collaborating organisations
  • Open data may increase sharing, but one respondent made the point that to create data that is worth sharing, commercial models of data creation and provision also need to be catered for by tools and services. “why the obsession with ‘open’; misses the whole point of long term data quality and competitiveness of UK plc; we release some data under  OS Open Data license but the focus is otherwise commercial and linked data initiatives of any flavour should be duty bound to (a) understand this and (b) ensure that “solutions” (be it tools or services) encompass commercial rights
  • The questionnaire respondent that operates in a very commercial and global publishing environment that provides very little data for free, made the point that tracking data components where they may be re-used is not a commercially viable or customer pleasing option; “legal and customer demand are at odds. New agreements are being created with broader rights for which a (publishing company) will pay a premium. It is no longer viable to labour over the tracking of of components of content for 50p a chapter, for example. This impacts territorial restraints which must also be negotiated and re-visited.”

The points raised here are an interesting reflection of the complexity of sharing data, linked or otherwise, and the tension between sharing data for the benefit of society and the economy, producing quality content and metadata, and ensuring that data creators receive a fair return for their efforts by tracking and controlling data use.

If only I had a data license tool or service that…?

  • Automated the Digital Rights Management Process – i.e. identity management, grants and permission management etc. etc.
  • Simplifying licensing language that can be more broadly applied is a goal. Individual nuanced contracts with single authors may be a thing of the past.

Metadata

The questionnaire respondents were asked to rate various aspects of data policies according to their perceived importance for data sharing and use. The results are as follows;

 Number = respondent not important useful but not essential important very important
Access policy 2, 3 1, 4, 6
Privacy policy 1, 2, 3, 4 6
Terms of use and re-use 3 1, 2 4, 6
Pricing structure 2 1, 6 3, 4
Archiving methodology 3 1, 2, 4, 6
Quality metrics 1,3 2, 6 4
Data provenance 3 1, 2, 6 4
Data standards used 1, 2, 6 3, 4, 2
Data format 3, 4 6 1, 2
Data type (quantitative, qualitative) 3 1, 2, 6 4
Domain or subject area of interest 4 1 2 3, 6
Date of creation 3 6 1, 2, 4
Time period data refers to 3 2, 6 1, 4
Units of measure used 3 1, 2 6 4
Data collection methodologies 2, 3 1, 6 4
Things referred to in the data 3 2, 6 1, 4
Relationships between things 3 4 1, 2, 6
Documentation location 3 1 2, 4, 6
Complete data set size 3 4, 6 1, 2
Data area of coverage 3 6 1, 2 4
Geo-Referencing method 3 2, 6 1 4

RAGLD notes

The aspects of data policy said to be most important to the most questionnaire respondents were access policy, date of creation, and defining the relationships between things. Data privacy policy, data provenance, data type, and documentation location were also described as being important to a number of the respondents.

Any tools or services developed by the RAGLD project should therefore ensure that these aspects of data are maintained with the data throughout, although it should be possible to include all aspects of a data policy should they be required.