OSF Architecture
From TechWiki
The Open Semantic Framework architecture consists of a number of layers and a number of open source components. These are described below. (There is also a related product slide show from Structured Dynamics that describes some of these components in more detail).
Contents |
Overall Architecture
Here is a simple view of the overall OSF architecture, with certain examples and contributing components highlighted:
To try out the full stack yourself, please visit the Citizen Dan sandbox, which is a working demonstration of OSF in the context of local governments and community indicator systems.
The Paradigm of Ontology-Driven Applications
By combining the richness of existing information structure with semantic technologies, the OSF provides a new paradigm in information technology: ontology-driven applications, or ODapps. Ontology-driven applications are modular, generic tools, which operate and present results to users based on the underlying structures that feed them.
The ontology-driven paradigm replaces the current brittle and specific approaches to application code development, query formulation and report writers. Instead, attention shifts to the structure and organization of the information itself, a focus that democratizes the knowledge management process.
The embracing context for ODapps is the Open Semantic Framework. OSF is a combination of a layered architecture and modular software. Most of the items below are contributing parts to the OSF.
The open semantic framework may be deployed in its entirety, or by mixing-and-matching various parts within its framework. The OSF completely embraces the product architecture shown above. The relationship of the various components and layers that make up the framework is depicted by what some have affectionately called the "semantic muffin" (above).
The software and tools associated with the OSF are all open source, and are available for download from the OpenStructs.org Web site. This entire TechWiki is also devoted to explaining various aspects of this overall system.
The Application Layer
The application layer consists of the content management system (CMS) and data visualization and management widgets.
CMS Sub-layer
OSF currently provides a user application layer via conStruct, a structured content system built on Drupal. conStruct enables structured data and its controlling vocabularies (ontologies) to drive applications and user interfaces. It is based on RDF and the structWSF platform-independent Web services framework (below).\
Users and groups can flexibly access and manage any or all datasets exposed by the system depending on roles and permissions. Report and presentation templates are defined, styled or modified based on the underlying datasets and structure. Collaboration networks can be established across multiple installations and non-Drupal endpoints. Linked data integration can be included to embrace data anywhere on the Web.
conStruct provides Drupal-level CRUD, data display templating, faceted browsing, full-text search, and import and export over structured data stores based on RDF.
Data Visualization and Manipulation Sub-layer
The application layer also includes various open source semantic components (sComponents), which are Flex-based data visualization, display and manipulation widgets tailored to structWSF data. These components may be used directly in HTML pages or embedded in conStruct.
The sComponent library currently includes its own ontology and control object. Representative widgets include the concept (relation) browser, search, story viewer, map objects, image, box object, text object, pie chart, bar chart and linear chart.
The Ontology ('Schema') Layer
Ontologies are the key structures that provide the horsepower behind these ontology-driven applications. As such, however, these data-driven adaptive ontologies - with their expanded duties in Web deployment and user interfaces - have added requirements:
- Linked data, and the use and accessibility of URIs as resource identifiers
- Workflow considerations with explicit treatment of user edits and candidate suggestions
- Context- and instance-sensitive data display, including templates, and
- Preferred and alternate labels for data objects to "drive" user interfaces.
A Common Web Services Interface
An essential interface layer is the mediator between existing data assets and structure and the interoperability provided by adaptive ontologies. This layer needs to communicate and present clear semantics at the interoperable side of the interface. It needs to accept and convert a diversity of data, structures and schema.
The structWSF Web services framework provides this interface layer. structWSF is platform-independent middleware for accessing and exposing structured RDF data, with generic tools driven by underlying data structures. Its central perspective is that of the dataset. Access and user rights are granted around these datasets, making the framework enterprise-ready and designed for collaboration. Since a structWSF layer may be placed over virtually any existing datastore with Web access - including large instance record stores in existing relational databases - it is also a framework for Web-wide deployments and interoperability.
The structWSF framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The core structWSF framework comes packaged with a baseline set of about twenty Web services in CRUD (create - read - update - delete), browse, full-text and faceted search, and export and import (multiple supported formats). More services can readily be added to the system, including advanced analytics, ontology creation and management, and data visualization. All Web services are exposed via APIs and SPARQL endpoints.
The tools within structWSF (or those designed to interoperate with it) are different than traditional applications. They are designed to have generic functionality, the specific operation and expression of which is based on the inherent structure within the data and its relationships. This design approach is closer to Web 2.0 "mashup" designs, which emphasize APIs and protocols.
The Conversion, Extraction and Authoring Layer
Other engines complement these ontology-driven applications. These engines provide information extraction, RDF conversion of legacy data structs, and simple dataset authoring and exchange formats.
Information extraction is important because 80% to 85% of all information resides in unstructured text. Metadata tagging through IE allows faceting, finding named entities, and inferencing over conceptual relationships. The OSF information extraction engine is scones (Subject Concepts Or Named EntitieS). It uses rather simple natural language processing (NLP) methods as informed by concept ontologies and named entity (instance record) dictionaries to help guide the extraction process. The co-occurrence of matches between concepts and entities also aids the disambiguation task. The resulting scones tags can be managed separately or fed to user interfaces or re-injected back into the original content as RDFa.
For RDF conversion ("RDFizers"), there are ones built into OSF as well as more than 150 third-party format options. Many of these converters can also work directly with major application APIs.
For dataset authoring, OSF also includes an instance record and object notation that can be serialized as JSON (called irJSON), XML (called irXML) or comma-separated values (or CSV comma-delimited files, called commON). The purpose of these notations is to provide easier authoring environments and scripting support to RDF-ready datasets. The advantage is to shield users from the nuances of RDF. The design of commON is especially geared to using spreadsheets as authoring environments for instance record tables or simple outline structures.
The Starting Layer: Existing Assets
Of course, in all cases there are existing systems hosting and managing the enterprise's core information assets. OSF is specifically designed to leverage these assets.