I had the chance to attend two sessions on freebase and Google Knowledge Graph during Google I/O 2013.
The first session was a general introduction on the linked data concept and presentation of the work done around shema.org. A platform developed by Bing, Google, Yahoo! and Yandex in partnership with W3C to develop standard around linked data and semantic web.
Schema.org is like a massive dictionary and mark up model to describe various entities through a huge collection of properties. You can browse the full list of entities describe from this page. This session also present webmaster tools to
- use information from schema.org and
- test how data will render on different search engine
- see usage of your data
- improve your data markup
The second session was more focused on freebase, and how anyone can leverage knowledge and data stored in it using its API. For example, the Freebase search widget is ready to integrate code to browse freebase entities given a specific category and a keyword (through the search function). The widget returns list of possible matches that the user can browse by hoovering the hits in the drop down list. If used in a production environment the API results should be constraint as it can return
- way too many matches
- information on a matches that are not useful in your use case.
For example playing a bit with API, the following query will provide 20 city part of Ontario:
I invite you to read the cookbook to master various constraint and query parameters as the API is flexible to query or display only certain fields. You can see more usage of freebase search API by checking the documentation.
Using the topic API, one can quickly retrieve all information available on freebase for a specific entity. This get interesting once you know that freebase also stored links to other notorious web pages and social media (twitter, facebook, wikipedia …) for most topic (people, organization, location).
This session was also the opportunity to ask questions regarding methodology to maintain data quality on freebase (since information are pull out from wikipedia and any user can submit changes and new information), where we learned that freebase:
- flag duplicate for merge process
- wait two weeks to integrate wikipedia changes, so the wikipedia community has time to moderate it (freebase / google and wikipedia has developed formal relationship around this project)
- keep track of all changes with user id and timestamp so can flag and / or revert submission that doesn’t meet freebase standards (the moderation system is similar to the wikipedia one)