Question 2: What are the prospects for interoperability in the future?

"Interoperabilty"¹ is a feature both of data sets and of information services that gives access to data sets. When a data set or a service is interoperable it means that data coming from it can be easily "operated" also by other systems. The easier it is for other systems to retrieve, process, re-use and re-package data from a source, and the less coordination and tweaking of tools is required to achieve this, the more interoperable that source is.

Interoperability ensures that distributed data can be exchanged and re-used by and between partners without the need to centralize data or standardise software.
Some examples of scenarios where data sets need to be interoperable:

transfer data from one repository to another;
harmonize different data and metadata sets;
aggregate different data and metadata sets;
virtual research environments;
creating documents from distributed data sets;
reasoning on distributed datasets;
creating new information services using distributed data sets.

There are current examples of how an interesting degree of internal interoperability is achieved through centralized systems. Facebook and Google are the largest examples of centralized systems that allow easy sharing of data and a very good level of inter-operation within their own services. This is due to the use of uniform environments (software and database schemas) that can easily make physically distributed information repositories interoperable, but only within the limits of that environment. What is interesting is that centralized services like Google, Facebook and all social networks are adopting interoperable technologies in order to expose part of their data to other applications, because the huge range of social platforms is distributed and has to meet the needs of users in terms of easier access to information across different platforms.

Since there are social, political and practical reasons why centralization of repositories or omologation of software and working tools will not happen, a higher degree of standardization and generalization ("abstraction") is needed to make data sets interoperable across systems.

The alternative to centralization of data or omologation of working environments is the development of a set of standards, protocols and tools that make distributed data sets interoperable and sharing possible among heterogeneous and un-coordinated systems ("loose coupling").

This has been addressed by the W3C with the concept of the "semantic web". The semantic web heralds the goal of global interoperability of data on the WWW. The concept was proposed more than 10 years ago. Since then the W3C has developed a range of standards to achieve this goal, specifically semantic description languages (RDF, OWL), which should get data out of isolated database silos and structure text that was born unstructured. Interoperability is achieved when machines understand the meaning of distributed data and therefore are able to process them in the correct way.

^{1 Interoperability http://en.wikipedia.org/wiki/Interoperability}

Hugo BesemerSelf employed/ Wageningen UR (retired)Netherlands

06.04.2011

Linked Open Data is probably the way to go. But there is a chicken-and-egg dilemma here: why would people make the investment and expose their data if nobody comes to use it and there is little data to combine with?

I think CIARD ot the agricultural information community in general can play a role, I can think of at least two ways:

- Formulate and find funding for projects that use LOD as a technology and that solve real life problems. The way to engage should be exposing your data in the right format.

- As Diane pointed out (and I hinted at it) documenting the data (how it is collected and what the parameters mean) is a lot of work and scientists need to provide most of it. Tranlating to LOD is still more work: you do not just have to say what the rows and columns in your spreadsheet mean, you should also think of the right encodings (URI's) for the things and properties. and values But this is something where the community (through CIARD or otherwise) can help by developing guidelines. AIMS has made a start by working on guidelines for the exchange of bibliographic information (LODE), but what about other forms of information typically exchanged for agriculture, like field trials, soil surveys, farm data etc.?

I am aware that I am inclined to talk about datasets in the first place, that is what I am workiong on at the moment. But I think much of this is also applicable to other forms of information, like news, project descriptions etc.

properties and

Johannes KeizerFAO of the United NationsItaly

08.04.2011

There will not be a bulk transformation of data sets into "LOD". This will happen in a pragmatic case by case manner. A need comes up, i.e. "comprehensive information on data regarding animal feed" and the "needy institution" will contact the data owners to mobilize the data. I another scenario a world wide community of researchers in a specific area will do the necessary transformation work to foster collaboration. I a third szenario a data owner will transfer his data into "LOD" to push them stronger into a world wide use.

I an way, data sets, are much easier to transform into a format sharable through RDF LOD. They are already structured and the meanings of fields and columns are defined; sometimes it is a straight forward transformation process. There is the problem of " provenance" - with all kind of data, text, numbers, pictures and others.

Valeria PesceGlobal Forum on Agricultural Research and Innovation (GFAR)Italy

07.04.2011

Hm... it seems nobody has realized that this thread has started... :-)

I think it is very important to agree on what interoperability is and what different levels and types of interoperability can be achieved.

I guess we all agree that a web page listing some database records in HTML is not interoperable, while the same list of records as an RSS 2.0 feed is interoperable, but for example do we all agree that the same list as an RSS 1.0 (RDF) feed extended with richer metadata and using Agrovoc or NALT terms (or even better URIs) for subject indexing is more interoperable?

This is important because once we decide to make the effort of making our sources interoperable it is worth opting for the better forms of interoperability.

I would say that the level of interoperability of an information source corresponds to the number of "lives" that data from that source can live.

Information in an HTML page doesn't live any more lives than its own.
Information in a basic RSS 2.0 feed lives potentially infinite new lives in its "first generation": it re-lives in all the websites and RSS readers that display it. But basic RSS metadata do not allow for efficient filtering and semantic re-use of the information and most websites and RSS readers cannot do much with basic RSS feeds beside displaying them, which in turn means again HTML pages, and no more "lives".
Information in an extended RSS 1.0 (RDF) feed can live the same lives as basic RSS 2.0 feeds in its "first generation", but with the additional advantage that RDF triples (and URIs) and richer metadata can be more easily re-processed and re-packaged as new interoperable sources, allowing information to live additional lives, through several "generations".
Information in a highly specialized XML format can be easily re-processed and re-packaged by specialized clients, but few consumers will be aware of the specialized metadata (provider and consumer are "tightly coupled"), thus limiting the number of "lives" it can live.

Usually, specialized formats, vocabularies and protocols allow for more advanced re-processing of the information, including semantic re-organizations, but only by specialized consumers, while simple protocols (RSS) in simple formats (plain-structure XML) using well-known vocabularies (DC, FOAF) are easily understood by any consumer (like RSS readers): provider and consumer are "loosely coupled".

These are two different types of interoperability, and it's not always easy to decide which one is better. In most cases the best option could be to expose data in more than one format / protocol.

In general, it seems to me that RSS 1.0 (RDF) feeds using extended metadata and URIs from standard vocabularies combine the best of the two worlds.

Sanjay Chandrabose SembhooAricultural Research and Extension UnitMauritius

07.04.2011

The concept sounds very appealing as it will open doors for better information sharing. Perhaps from another angle, it will also help wastage of resources in terms of for instance duplication of work.

However, I must also confess that interoperability also looks as Eutopia! For the simplest of reasons (as our friend Andry pointed one for Q1) - Digital Divide.

For interoperability to work, websites / platforms etc must have a minimum level of features that will allow them to handle meta data (and all the associated tit bits) and to communicate among themselves.

Then there is also the issue of standards. Whenever you speak about this to head of institutions, its as if you are speaking Martian and as if you are asking for Mega $$$$.

I believe, if we are to overcome such massive constraints, CIARD will have to be proactive and invite all NARS across the globe to first join the CIARD community.

By brining them to a single platform, it would be easier to communicate with them and push forward ideas that in isolation would seem impossible to conceptualise.

Instead of explaining the process of interoperability, the best approach will be to first explain the benefits that THEY would reap with interoperability ...

...

Mohamed SallamYemen

07.04.2011

Sorry for repeating this as I wanted to bring this reply here. My name is Sallam working in technology Dessemination Department with Research Authority in Yemen.

I see that previous contributions have raised many issues of which many go beyond information sharing, but need to be reflected into future trends that help improve information documentation and information sharing. Some issues like another continent … another dream, lack of knowledge on computers and web 2.0, "my interest first", cultural heritage, lack of knowledge packed products that are in the interest of farmers, lack of incentives among researchers, especially in many of the developing countries, lack of clear culture sharing, how to document and make visible outputs, and other many issues that could remain as obstacles affecting information sharing.

I go back again to the persistent gap between the efforts of producing knowledge that are in the minds of researchers or in technical reports or even in scientific articles and the efforts of integrating this knowledge into simple and visible outputs that are of the interest of farmers, especially poor farmers. Many researchers in many countries think that their end product is in publishing their research results in scientific journal where they gain scientific recognition and job promotion.

It would be wise if we think of ways of motivating researchers to pay efforts in making visible outputs and success stories in formats that are in the interest of farmers rather than the sole interest of the scientific community. I can give a story from my experience, as I tried to pull out my work experience during the past 20 years into two success stories, one of these stories was published as a study, not as a scientific article although it was reviewed. The another story was published by GFAR in a competitive work. Also I tried to prepare many small booklets and leaflets that are useful for farmers as they are supported by results from marketing surveys and marketing information system. The issue is when I introduced all my work for scientific promotion, all were rejected as they are not published in scientific journals including the one published by GFAR/AARINENA.

Now, the issue is how we can think of suggestions that could help facilitate better recognition of research efforts and contribute to breaking the vicious circle in the integration of scientific and indigenous knowledge as well as the mechanisms that facilitate more participatory and farmer-centered approaches leading to suitable formats of publishing and sharing information.

Richard KedemiKenya Agricultural And Livestock Research InstituteKenya

11.04.2011

I agree totally with San_jay contributions, the factor that this is the way to go in terms of opening doors to sharing infromation and mostly for the developing countries and I want to believe even with the digital divide and our intitutional heads not undertanding the new terminologies it can happen.

First we need champions who can push the gospel of features and standards that enable sharing of information, by promoting platforms like CIARD as a refence point.

On the other CIARD needs to document processes for example Valeria contributions where she explains the pathways and the results (what you can achieve if you used the RRS), this would enable users easily choose which tools work for them. Second document best practices and success stories that can be used as case studies by others for example what we have done in KAINet has resulted in having GAINS and ZAR4IN. Or look at the guys at ILRI are doing in sharing info. despite being in developing cuntry with the digital devide.

I think with Interoprability we can dream with Johannes but at different levels.

Johannes KeizerFAO of the United NationsItaly

07.04.2011

In my country of origin, the discussion about animal feeds and their economical, ecological and social impact is a hot topic at the moment.

In a lunch meeting today with view of the Spree (but in telephone connection with Munich :-)) we discussed the possibility of a webportal that brings together all information on animal feeds.

Let´s take Soya as an example:

Different types of Soya , in which feeds is it used, what are the formulations, for which animals it is used, which country delivers, who are the producers, in which way do trade streams evolve, prices, what pesticides are used, which are the residue limits, which incidents on contaminants have been reported, analyses published by the producers, which is the energy balance in different ecological conditions, laws.

Al lot of this information is available but it is hardly accessible , because of the time it takes to screen all available information by a human. A lot of information is also unavailable because neither will nor obligation to publish does exist.

We discussed how to overcome these problems and to set up a nice little prototype within December, but I will discuss the way to do this under the next topic. :-)

Now I want to point out three somewhat hierarchical conditions of sharing data and making shared data processing possible.

a) data need to be public. This is not a technical, but a societal issue. It is not a small issue, but awareness that openness and transparence is good is growing. This is the basis, but it creates only a dispersed universe of datasets (in a broad sense considering that also text is data)

b) data need to be published in a way that machine can process them. XML has been created to make texts machine processible and to exchange data between databases. XML is only a Syntax and does not express meaning. So the W3C added RDF and OWL to express meaning. There is technology to use data expressed in RDF or OWL. Encoding your data this way is sometimes an investment, but not rocket science.

c) the semantics between different datasets must be understood by machines. This is much more tricky and without reference to common vocabularies/ontologies very labor intensive. If disperse datasets refer to common published vocabularies/ontologies, it is much easier. This is the reason why our AIMS team in FAO is so heavily investing in AGROVOC and similar vocabularies.

If all these 3 conditions were met by all data sets, the construction of our portal on animal feeds would be really easy. And not only a portal on animal feeds. We could have "google desktops" with the information we need for our work.

Gerard SylvesterFAOThailand

08.04.2011

See eScienceNews , there is no human editor behind this news aggregation service. It's automated and how is it done? It pegs on the condition of sharing data (that Johannes mentiones earlier). Opening up access to data, exposing the data in an format that is consumable (easily), also it has to be well described (semantics) - the pillars to facilitate interoperability among data and data sources.

Johannes KeizerFAO of the United NationsItaly

08.04.2011

eSciencenews is a harvester based on Drupal, that screens press releases and blogs from all important universities. The content is then indexed with an internal categorizations scheme by an machine run algorithm and displayed.

If the categorization scheme of eSciencenews would be based o a LOD vocabularies using URIs, it could automatically further link to other resources which would be marked up/indexed with the same URIs

John FereiraCornell UniversityUnited States of America

08.04.2011

I am somewhat familiar with the eScienceNews system and although I haven't looked at the underlying technologies the site uses for it's implmentation I have a pretty good guess as to what it's doing. I suspect that it's using a system called OpenCalais (http://www.opencalais.com/) a web service that does a semantic analysis of "documents" using Natural Language Processing, machine learning and other methods to provide entities/tags that are delivered by back to the client which can then be used to enhance the discovery of those document by providing information on what a document is about.

When we're talking about where we can go in the future in sharing information, tools suche as Open Calais that let the machine do some of the work to improve interoperibility the the discovery of information will become quite valuable. Another project that I am familiar with is an AgroTagger system which essentially uses a similar text analysys approach then applies AgroVoc terms for tagging the document.