E-Agriculture

Question 3: What are the emerging tools, standards and infrastructures?

Question 3: What are the emerging tools, standards and infrastructures?

The new paradigm for interoperability on the web and  for building the basic layer for a semantic web is the concept of Linked Open Data1 (LOD).

Instead of pursuing ad hoc solutions for the exchange of specific data sets, the concept of linked open data establishes the possibility to express structured data in a way that it can be linked to other data sets that are following the same principle. Examples of an extensive use of "linked open data" technologies are the NYT or the BBC news service. Some governments too are pressing heavily to publish administrative information as LOD.

                             


   The Linking Open Data cloud diagram


The technology of LOD is based on W3C standards  such  as the "Resource Description Framework2" (RDF), which facilitates the exchange of structured information regardless of the specific structure in which they are expressed at  the  source level. Any database can easily be expressed using the RDF, but also structured textual information from content management systems can be expressed in RDF. The presentation of data in RDF makes them understandable and processable by machines, which are able to mash up data from different sites. There are now mainstream open source data management  tools like  Drupal or Fedora commons which already include RDF as the way to present data.

Within the area of agricultural research for development an infrastructure to facilitate the production of linked open data is needed. The four key elements to make this possible are:

   a registry of services and data sets (CIARD RING,http://www.ring.ciard.net);

   common vocabularies to facilitate automatic data linking (thesauri, authority files, value vocabularies);

   technology (content management systems, RDF wrappers for legacy systems);

   training and capacity development

 



1 Linked Data - Connect Distributed Data across the Web http://linkeddata.org/ Last accessed March 2011
2 Resource Description Framework
http://www.w3.org/RDF/ Last accessed March 2011

Valeria Pesce
Valeria PesceGlobal Forum on Agricultural Research and Innovation (GFAR)Italy

I would start with a short list of some interesting recent developments (in terms of tools, standards and infrastructures) that can help achieve better interoperability of information in agriculture going in the direction of Linked Open Data (LOD). 

1) The publication of "authority data" that are relevant to the agricultural sector (and here I include subject vocabularies, KOS, authority lists of special entities like journals or authors, geographic entities...) as Linked Open Data. An example is Agrovoc. Also the geopolitical ontology is ready to be published as LOD. And an authority list of journals on agriculture has been published by FAO.

2) The mapping of some of these authority data between each other (e.g. Agrovoc to NALT, and several geographic encoding standards mapped in the geopolitical ontology)

3) Software tools (document management systems, content management systems, blogging platforms etc.) going towards LOD.
In the AgriDrupal community, we are experimenting with the Drupal CMS and its RDF features. Drupal can expose all contents as a triple store, mapping all data in the system to classes and properties from any namespace (also through a SPARQL engine) and consume Linked Data by importing RDF records both from files and from SPARQL queries.

4) A very recent development: the preparation of some "recommendations" for publishing bibliographic records as LOD: this is interesting because it goes beyond the concept of a rigid RDF schema and proposes several options for each RDF property, taken from vocabularies such as Dublin Core, Bibo and AgMES, giving options both for literal values and for URIs and different options depending on the granularity of description desired. These recommendations should be published at the end of April.
Similar things could be done for other information types.

5) A portal keeping track of all the information services / sources exploiting these standards and tools: the CIARD RING.
 

Krishan Bheenick
Krishan BheenickForum for Agricultural Research in AfricaGhana

Thank you Valeria for enlightening us on the trends in tools and technologies.
I believe this kind of information, i.e. being described at a more general level, and which most of us can relate to, is what we need more of.


Too often, I have felt that the CIARD initiative is trying to talk to a broad range of specialists, each more interested in one area of the spectrum of disciplines that the CIARD has to deal with. However, each group of specialists needs to use some technical terms that others may not be familiar with. In the end we have useful conversations going on in pockets of specialised areas, and news of the success stories or significant initiatives are not reaching the rest of the CIARD stakeholders.

I would like to suggest that we try to draw a conceptual model of how our interventions fit in the broad sense of what CIARD is trying to help us achieve.
If we take the LOD as an example, it needs to be explained to all our stakeholders so we all understand why a group among us is getting particularly  excited about LOD – generally its those people involved in vocabularies, ontologies and (?) who seem to be saying a lot about LOD. What is it that the extension and advisory services information specialists need to grasp about LOD that will also bring them on board? We need to have some of these examples brought to the surface.


During the past 2 questions, there has been an extended discussion on the issue of the Research-Extension-Farmer linkages,  with a new breed of Agricultural Information Managers sitting somewhere within that triangle. There has also been talk of multiple triangles of people:processes:technologies being mentioned which are interlinked, and the need to define these.


So, in addition to the tools, standards and infrastructures, we seem to need the conceptual framework within which we are working to be better defined. Maybe I should say that there is a need  for the conceptual model to be formalized, as we all seem to have an idea of what CIARD helps us to do, but we have not yet shared our mental models to agree on a formal model that tries to describe the CIARD framework. Could this interfacing be visualized in the form of those triangles being interlinked at the people/processes/technologies points such that they bridge people from different ends of the CIARD stakeholder spectrum? Perhaps that is an activity that could be tried out during a face-to-face gathering discussion the CIARD.

Hugo Besemer
Hugo BesemerSelf employed/ Wageningen UR (retired)Netherlands

Well, I have tried to do some of what you propose: explain to a wider audience why we get excited about LOD. In a general introcution for  information committees within our science groups here I had included it  in a general intro on data repositories and data curation. I could as well have spoken about Paracelsus' prognostications, and I have skipped it for later presentations.

I guess that people will understand it when they see a result. And it should be something that really could not have been done in another way. Just pulling in additional information from other sources  is not good enough, people are used to web 2.0 mashups.

Marco Fahmi
Marco FahmiInstitute for Sustainable Resources/Queensland University of TechnologyAustralia

 For my work, I use the two frameworks to present research based on aggregating and synthesising data: a Researcher/Policy Maker/Grower framework the clearly identifies the benefits of the work for each of these groups. For the scientists, this usually cover common themes discussed here such as standardisation of data. But also, higher exposure of research data and the (re-)use of data to build better models.

For the policy maker, publicising, standardising and sharing data is a win because it is a better investment of public money. Often the data is generated using public funding or public infrastructure. Open access to the data removed barriers for better exploitation of the data. It also means that research projects have tangible outcomes in the form of readily available data that can be used by scientists and possibly growers. Finally, the re-use element of data sharing is attractive as it means better return on the investment since the data can be later used for longitudial studies or for larger scale models without any outlay of money for new data collection programs (in comparision, data management and curation is cheaper).

For the grower, the immediate benefits might be a little trickier to convey but they often grasp the need for better models which can only be obtained with larger data sets with high temporal or geographic resolution.

The other frame I use is the People, Technology, Process continuum. This highlights that the technology part of data sharing is often that easiest or least problematic. We also have fundamental issues to deal with in research practice (when and how well do we document data? What is a data set? How to we guarantee that data is shared in safe and secure way? How can we combine data from various sources in a meaning way?)

The people part focusses on intrinc and extrinic rewards for sharing data. Incentives to change attitudes about open access of data and impositions to change behaviours. Also, the ability to acknowledge and rewards early adopters of open access etc.

Thank you a lot for all these interesting references. I received a draft version of the recommendations mentionned in 4) and I can say it is very useful.

Diane

LOD is not so heavy as ontology, but I am worried about its ability to express semantic.

Hugo Besemer
Hugo BesemerSelf employed/ Wageningen UR (retired)Netherlands

For a complete picture: maybe  Linked Open Data can be the way to go for many types of data, but not for all. For many of us disk space seems to be unlimited, but there are also scientists who manage to get beyond those limits. Observational data like spectral data  sometimes comes as multidimensional arrays and it may come in terabytes of it. Even a simple marked-up character based format like csv may become too large. There is a binary exchange format NetCDF http://www.unidata.ucar.edu/software/netcdf/docs/faq.html .  NetCDF automatically documents the data structure as well (but of course all sorts of additional data documentation is needed for re-use.) There is specific  indexing software http://opendap.org/ to query and transfer parts of a dataset (for files of this size transport is also an issue) 
Although linjked open data is out of the question for the datasets themselves, the metadata that descripes them may very well be LOD. An example of a  repository that exposes its metadata as LOD and exchanges as NetCDF is 3TU Datacentrum http://datacentrum.3tu.nl/en/home/

 

Marco Fahmi
Marco FahmiInstitute for Sustainable Resources/Queensland University of TechnologyAustralia

 A number of organisations here in Australia have used NetCDF with great success. Namely, it has been used to store remote sensing data (AusCover) and gas emissions data (OzFlux). It has also been used (with custom variations) to store marine data (eMII).

As Hugo notes, NetCDF is very convenient in that it encapsulates both data and metadata in one file without having to worry about low-level technical details such as the order of data rows, number formatting etc.

What is not encapsulated is a controlled vocabulary or an ontology; this will have to be linked to externally. This is not necessarily a bad thing a loosely-coupled ontology means you can easily mix and match ontologies (or have none) based on your need.

On the other hand, I think that preparing and sharing NetCDF documents is probably only convenient for large homogeneous data sets and where the scientific community has pretty much agreed on the general format of the data. I wonder if preparing NetCDF files (and discovering the content of NetCDF files) when data sets are small or there are no standards is too burdensome for the individual researcher.

CIARD RING as well as this forum is really interesting to identify tools, infrastructures. It gives the opportunity for partnerships.

 Dear All,

I am sorry for being late in contributing to this valuable E-discussion and my points will concentrate  mainly on defining the main data users and beneficiaries and what will be our main purpose from this approach in data management and utilization for AR4D.

I am much worried about how it will be easy for every one looking for such data to use it in a very simple way without going through this long series of procedures and recommendations.

Me as rural community development expert I am much interested in how I can utilize this data when planning for my programs and activities, how can i transfere this data to actual programs and projects that benefits farmers and how i can introduce appropriate technology/information in developing my agriculture and farmers standards of living.

They said if you have the information you have the power but how can I use this information to develop my techniques and ways for more production, good quality and better market.

Charing of information are very essential as first step to start maximize our research benefits and for wider distribution and benefit but with now tools in how to use it in practice and apply it in the ground to the designated target groups will be ONLY adding new books on the Shelf.

Thank you again for giving us this opportunity to take part in CIARD valuable initiatives and Best Regards,

Nabeel Abu-Shriha

Amman-JORDAN