Open Semantic Framework
The open semantic framework, or OSF, is a combination of a layered architecture and modular software. It is the most comprehensive framework within this OpenStructs site; many of the other aspects of OpenStructs are components within OSF. PDFThe open semantic framework represents the software component of the four-component total open solution. Some specific products (such as Citizen Dan, for example) are instantiations of this open semantic framework.
The design of the open semantic framework is the result of attempting to fulfill these objectives:
- Leverage existing information assets (data + structure) as much as possible
- Develop incrementally, and validate and justify as you go
- Emphasize, where possible, open standards and open software
- Employ Web-oriented architectures
- Adopt an open-world approach that acknowledges that information is most often incomplete; the approach is a key enabler for incremental deployments
- Use URIs as object identifiers, and use linked data where practical
- Embrace any data format found in the wild, but use RDF as the ultimate integration data model
- Design architectures and APIs that avoid "lock-in" and support multiple tools options across the stack
- Provide systems and capabilities that put all information sources -- text, media, semi-structured and conventional databases -- on an equal footing
- Promote designs that bring the ability to create useful results into the hands of users and decisionmakers; relegate IT to a support role.
The Incremental Layers of the Open Semantic Framework
The open semantic framework is a layered architecture, with clear APIs between layers as appropriate. The layered design can be likened to a pearl, where each subsequent layer does not embed the layer prior to it, and some layers actually may inter-operate with multiple layers below or above it (this is notably true for the "ontologies" layer, which has interactions up and down the stack).
Envision this pearl of the open semantic framework and its layers as follows:
The idea is that layers may accrete (as in the growth of a pearl) and occur over time and be uneven. Each layer, though, does have a role to play (though it may not be needed in a given deployment), and does act to augment existing information assets in the transition to a semantic framework. Beginning at the core, each of these layers -- with external references as appropriate for more details -- is described below.
Existing Assets Layer
The open semantic framework is premised on leveraging existing information assets. Once the framework is in place, new information can be brought into it in a direct, semantic manner. But, the real thrust and benefit of this framework is to provide an incremental pathway for inter-operating and federating existing data, structure and information assets.
These information assets may reside inside or outside the enterprise. They exist in many formats and are described by many schema. They may come from internal transaction systems or warehouses, or may exist external on the Web or at supplier or partner sites. These information assets may span from conventional databases and relational data systems to XML interchange standards, Web pages and standard internal text or documents. In short, there is no information asset that is not amenable to be included in this framework.
The Information Transformation (scones/irON) Layer
The information transformation layer provides either: 1) extraction of concepts and entities as structured metadata from source text or documents; or 2) conversion of existing data assets to interoperable form. Extractions may be conducted by either scones (Subject Concept or Named EntitieS) or third-party utilities. Conversions may occur via irON (instance record Object Notation) or third-party 'RDFizers'.
Depending on the source, the net result of the transformation is to produce interoperable data and information that can be ingested and used by other layers in the framework.
Though not strictly analogous, this layer bears some resemblance to the ETL (extract, transfer, load) utilities used in many enterprise information integration applications. Unlike those conventional systems, this information transformation layer also may capture and represent some of the source schema.
In all cases, however, these transformations are relatively simple and get parsed against the available structure (the ontologies, schema and entity reference lists) in the system to generate the semantic metadata (tags).
At this point, the extracted structure is generally at the level of instance records, or the ABox, with simple assertions of attribute-value pairs for specific records. Little schema transformation or mapping occurs at this layer (if such is needed, that occurs at the structWSF layer; see next). Actual federation or interoperation occurs at later layers based on the TBox structures 
This modular portion of the framework is explicitly designed with APIs to allow third-party tools to be plugged in and substituted.
The structWSF Layer
The major workhorse of the open semantic framework is the structWSF (Web services framework) layer. structWSF is the most complicated of the OSF layers and has many supporting software packages and capabilities. The structWSF layer provides the standard, common interface ("canonical") layer by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack.
structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies; see below).
The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The current structWSF framework comes packaged with a baseline set of about twenty Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML. An internal representation, structXML, is used for internal communications across all structWSF Web services and with other layers.
structWSF has a central service that governs access rights and permissions. These rights occur at the level of the dataset, which gives immense flexibility to how data may be accessed, read, modified, created or deleted (or not). Datasets within a given structWSF instance may be accessed directly via API or via SPARQL queries to the instance's endpoint. Depending on rights and query, results sets may be returned from a given structWSF instance in an infinite variety of ways.
This latter capability is the essential interface for subsequent layers in the open semantic framework stack. Depending on those subsequent components, pre-staged data and results sets may be returned for an essentially limitless variety of purposes.
Each structWSF instance also has a unique Web address that enables one or a multitude of instances to communicate and share with one another. This simple, but elegant, method enables structWSF instances to participate or not in potentially global or restricted local networks and collaboration environments. This is currently the largest untapped potential of structWSF with respect to its existing deployments.
The Semantic Components Layer
The newest layer in the stack is the semantic components layer. This layer takes results sets — most often generated by a specific query or data slice request — from one or more structWSF instances and then presents that information via a variety of data visualization or data presentation widgets (what is called ‘semantic components‘ due to their design. The operation and sensitivity of these display components are themselves driven by a presentation and data analysis (including statistics) ontology.
Current display widgets include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable without modification to any domain.
As presently implemented, this layer consists either of Flex data visualization components or structured data display templates based on Smarty. The inherent design allows for updates to other bases (such as HTML5). The layer may also be swapped out or substituted with third-party capabilities.
The strength and power of this system is governed by its own ontology, the Semantic Component Ontology (SCO) (see next).
This is an extremely flexible layer in the open semantic framework stack.
The Ontologies Layer
The ontologies layer actually refers to all structured assets driving the system. As such, this layer might be considered the “brain” (though rather simply specified!) of the open semantic framework.
At a true schema or TBox level, the ontologies layer represents the concept and relationships of the domain at hand. This layer also hosts the specific local entities and prominent things (people, places, events, etc.) useful for extracting local and domain-specific relevance. However, those views are also supplemented with some administrative ontologies (two examples are SCO and irON) that guide how the user interfaces or widgets in the system should behave.
The concept level represents the “world view” of the specific instantiation of the open semantic framework at hand. This conceptual (TBox) view provides the structural organization of information, inferencing capabilities, and navigation, faceting and explorer structure. The entity (ABox) view provides tagging for prominent individuals and instances important to the domain at hand, and guides the structure behind data visualizations of attribute or indicator data.
The administrative level uses simple roles and relationships for attributes and indicators to inform the framework as to how and with what widget to display information. For example, a “type” of information that is geographically related can be instructed to use the map component as an option for display. Whether some information is used for totals, comparison purposes, or other specifications useful to data visualization and graphing may also be specified.
The language and relationships (predicates or properties) of these administrative ontologies are simple and straightforward. It is, for example, relatively easy to define data display functions at the broad dataset and attributes level. Simple determinations drive how results sets and their associated results types may be displayed, no matter what datasets or slices may be generated as a result of the queries or requests fed to the system.
The structure in these layers can be replaced by other structures for other instantiations and circumstances. Indeed, all other layers in the open semantic framework can remain relatively fixed while tailoring the instance to new domains solely via this layer. The ontologies layer is what gives any given instantiation of OSF — such as Citizen Dan — its unique focus and scope.
The Content Management System (conStruct) Layer
The thinnest layer (that is, least substantial with respect to this framework) is the content management system (CMS) layer. In its current form, the open semantic framework uses the Drupal CMS via the conStruct plug-in modules. The design of the framework, however, has explicitly accommodated the possibility that other CMSs may substitute for this role.
The CMS layer is optional if structWSF endpoints are sufficient or if simple Web pages hosting semantic components are deemed as adequate. Very small organizations or deployments may reasonably choose to have no CMS layer at all.
However, for most sites or portals with more than a few active users, it is desirable to have broad flexibility in theming (”skinning”), user rights and permissions, or other functionality. These are the roles of the CMS layer. Drupal, for example, is presently supported by more than 4500 third-party modules in every conceivable function, from polling to blogs and rating systems and bulletin boards.
For such generalized portals or collaboration environments, it makes sense to adopt and install a flexible CMS system, such as Drupal. Much of the user experience and functional environment can be provided through such means.
The open semantic framework is thus designed to reside easily in a CMS while also providing the hooks to take advantage of the generalized user rights and functionality of the CMS. In this manner, the open semantic framework is able to stay focused on its structured data and interoperability purposes, while still gaining the advantages of rich-featured content management systems.
The OSF is a Web-oriented Architecture
With its inherent open-world orientation and distributed and collaborative potential, the open semantic framework was designed from the outset to be Web-capable and Web-oriented:
A Web-oriented architecture (WOA) has a number of understood requirements, to which the open semantic framework adheres. Specifically, these design considerations support the framework as being part of WOA:
- Data and objects are all identified with Web addresses (URIs)
- Data is generally exposed (and universally available) as linked data
- SPARQL endpoints and APIs are generally RESTful in design
- The overall architecture is modular, with inherent decentralized and distributed aspects
- All display and visualization aspects are cross-browser ready and capable.
OSF is the Basis for Domain-specific Instantiations
Citizen Dan is specifically geared to local governments and localities, with an emphasis on community indicator systems (CIS). CIS have become a popular way of measuring and tracking measures of local economic and social well-being; they are closely related to sustainability and how to measure it as used in many economic and environmental domains.
However, in the context of OSF, what is really interesting about Citizen Dan is that its semantic framework is a completely open and generic one. The same set of tools and capabilities described on its details page can be applied to any domain that needs to manage and understand information in its own domain. This includes from unstructured text or documents to conventional structured databases.
What changes from domain to domain are the data structures (the ontologies, schema and entity reference lists; see above) that are fed to this open semantic framework. By swapping out new structures, what can be called Citizen Dan in one instance can morph to become Curriculum Carla in say, the education instance or Doctor Doolittle in the veterinary science instance
These multiple instances can be illustrated as follows:
What this figure illustrates is that even a branded expression of the framework — such as Citizen Dan — is merely an instance of that framework. And, actually, when expressed in such a packaged manner, it is more accurate to call the standard and bundled suite of generic functions and accompanying structure of Citizen Dan as an instantiation of the open semantic framework:
- (transitive) to represent an abstract concept by a concrete instance
- (transitive, object orientated computing) to create an object (an instance) of a specific class
By replacing the structure bases, and by tailoring the function suite appropriate to a given market and use, many instantiations of the open semantic framework may be created for different domains and markets. In this manner, Citizen Dan can be seen as an early exemplar of the framework, but not as a definiton of it.
OSF is the Software Leg to a ‘Total Open Solution‘
So far, this discussion has focused solely on considerations of software and architecture. Yet the power of the open semantic framework, highly useful in itself, is inadequate alone to achieve acceptance and success in the enterprise (for example, see this post). The very forces that are compelling enterprises to look at new options, are also the same ones that pose difficult hurdle rates for acceptance of open source.
A four-legged foundation to a total open solution addresses this issue. The solution involves software, structure, documentation and methods (or best practices). Each of these connect and relate to the other foundations.
The open semantic framework is a software (and architecture) leg to this foundation. Again, however, what is interesting is that the mere swapping out of the structure can also make the system relatively ready for other domains.
These relationships are shown in the following diagram, which also shows that the DocWiki portions of the solution embody the documentation (aside from code-level comments) and methods legs of the foundation:
Differences between domains may also lead to differences as to which components are included or not in that domain’s desired instantiation.
An important point from the diagram above is that the content and methods in the DocWiki may be nearly universal to other domains. Because the deltas between domains largely result from structure and what specific functional components are included or not, it becomes clear that most documentation and practices shared with the DocWiki will be applicable across domains. Though the use cases and some of the specific terminology may change, the remainder shows a high degree of re-usability of documentation and knowledge base across markets. This realization makes the usefulness and leverage of the DocWiki even higher.
- ↑ Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots.
- ↑ A best practices approach makes explicit splits between the “ABox” (for instance data) and “TBox” (for ontology schema) in accordance with a working definition for description logics, a fundamental underpinning for how to use RDF:“Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.”
- ↑ See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say is it that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems.
- ↑ Of course, things are always not so simple as this. The CMS layer gives the open semantic framework the ready ability to change themes and layouts (”skins), not to mention the breadth and specifics of what ancillary site functionality might be provided. Moreover, the module basis of the open semantic framework also means that entire clusters of functionality might be dropped from a given instantiation (or added to it!) without violating or negating this framework.
- ↑ Dictionary references are from Merriam-Webster and Wikitionary.