You are currently browsing the category archive for the 'OWL' category.

Amazon announced recently that their EC2 customers can now access a number of public datasets including the DBPedia which contains 274 million RDF triples.  This is very cool news. Provides a great cloud based resource for semantic reasoning over this public data, and the ability to incorporate it into your own custom applications.

Here’s just a few of the public data sets that are now available on Amazon web services.

DBpedia Knowledge Base provided by DBpedia.
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples).

Freebase Data Dump provided by Freebase.com.
A data dump of all the current facts and assertions in the Freebase system. Freebase is an open database of the world’s information, covering millions of topics in hundreds of categories.

Wikipedia Extraction (WEX) provided by Freebase.com.
The Freebase Wikipedia Extraction (WEX) is a processed dump of the English language Wikipedia.

Check out the Amazon site for more details on the public databases: http://aws.amazon.com/publicdatasets/

Also it is interesting to note that the DBPedia folks recently announced links into the Freebase database that are referenced in their own RDF triples.   These links show up as RDF triples that use the OWL SameAs property like this:

http://dbpedia.org/resource/Woody_Allen owl:sameAs  http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000004064f

These new links are provided in the 3.2 version of the DBPedia, which you can play around with directly using their SPARQL query endpoint located at http://dbpedia.org/sparql.  They also have a richer query interface with sampel SPARQL queries here.

Another post about the BBC! I’m not apologizing, but the reason I like to post on them is that in a number of past projects I’ve had the pleasure of working with several BBC teams. I always enjoy working with their teams because they are a very smart group and are always trying lots of cool cutting edge things.

This month I have been following the semantic technology work done by the /programmes team, the /music team (see previous post) and the Radio Labs team.

There’s nothing extremely groundbreaking with the concepts that they are employing. However, it is the combination of those concepts and their use in the domain space that is most interesting to me.  My team has been pushing for many of these same concepts within our own solution offerings for the Media & Entertainment sector as well.

The core concepts of the Linked Data initiative seem to be fundamental to a lot of the new media work being done by the BBC.  The rules of “linked data” in respect to Media assets and Digital Asset Management make a lot of sense.  It not only makes it easier to discover metadata about media, it also makes it easier to aggregate metadata from multiple sources, and easily discover such things as rights, copyright, and usage information. 

The four rules of Linked data are the following:

 

  1. Use URIS as names for things. 
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a resource through its URI, provide them some useful information.
  4. Include links to other URIs, so that they can discover more things.

Seems very simple, and in fact using these rules in designing a next generation Digital Asset Management solution is not very difficult (although there are some design issues that you should be aware of). 

Exposing the assets or “entities” via a well known URI is fundamental to the ability to link stuff together that was not originally designed or intended to be linked. Following these simple principles allows for easy “mesh-ups” for consumer and B2B oriented media metadata exchange.  By building asset management systems in such a fashion, it becomes much simpler to build out program guides, EPGs, content aggregation portals like Hulu, Joost, and the BBC iPlayer.

Another aspect of the BBC’s work that I am interested in is their use of OWL ontologies to define their domain model.  The Programmes ontology defines the vocabulary for how to describe a brand, series or episode on the BBC’s web properties.  Their ontology follows similar ones in the Semantic Web in that it reuses or imports vocabulary from existing well known ontologies such as Dublin Core and Friend of a Friend (Foaf).  This is the approach that I took with the development of the IMM Core ontology in the Interactive Media Manager (IMM) solution for Digital Asset Management.  Our domain model was based on the abstract model for MPEG-21 Digital Item Declaration language which I will cover in a future posting. 

In the case of IMM, a lot of our URI designs followed rule #1, however we failed to provide the ability to effectively implement rules 2 and 3. Our URIs all begin with http://, but due to the nature of SharePoint being the underlying platform, we did not provide a simple way to make our resource URIs land on a page that showed meaningful data or links to related URIs.  This is something that should be simple to address in a future release. 

I really appreciate the way the BBC service is offering persistent URIs with the ability to adjust the serialization by adding a simple extension such as .xml, .yaml, .rdf, or .json to the end of the URI.  The BBC team’s underlying development should serve as a best practice for the media industry and will lead to the growth of the Semantic Web through linked data. I’d love to see all M&E companies begin to embrace these patterns.

I’ve been working with RDF Gateway from Intellidimension in our solution development for the last couple of years and today they dropped what I believe is a significant enhancement in performance to their already fantastic Semantics.Server solution for storing and retrieving RDF triples in SQL Server 2008.  Also new in this release is an update to the SDK that provides an Entity framework on top of the existing SDK.  The new Entity framework simplifies the way you work with business entities in your semantic applications. I’ll try to put a longer post together later that show how these technologies are used.

The BBC announced yesterday that they are now testing a new Music knowledge base service that aggregates content from the open source DBPedia and other sources. 

This is very exciting from my perspective, since my team has been working with Semantic Web technologies (RDF, OWL, SPARQL) for the last couple years to solve problems in the Digital Asset Management space.

I strongly believe that the use of open standards and ontologies is the correct approach to the ability to share and interop with metadata on the web and will lead to more connected media and consumer experiences in the future.

Twitter Updates

Pages

 

July 2010
M T W T F S S
« Jun    
 1234
567891011
12131415161718
19202122232425
262728293031