StructWSF Python Wrapper Tutorial
From TechWiki
This is a tutorial for using a structWSF Python wrapper called BKN_WSF which is just a single file, bkn_wsf.py. The file can be imported into a Python script or used by Javascript or PHP apps as a web service. Use as a web service is not documented (see source code for documentation). BKN_WSF was created by the Bibliographic Knowledge Network (BKN) which is a loose coalition of partners initially funded by the NSF Cyber-enabled Discovery and Innovation (CDI) Program (award 0835851). The mission is to build free tools and applications for researchers to distill, normalize, reconcile, publish, and analyze large open bibliographic datasets, and to develop a network of websites and webservices participating in this activity.
This document uses a few interchangeable terms,
Contents |
OVERVIEW
bkn_wsf.py includes wrappers and examples for using structWSF, a framework for interacting with an RDF repository. Review of structwsf documentation is recommended to better understand bkn_wsf. Not all structwsf functionality is wrapped by bkn_wsf but any structwsf functionality can be called from bkn_wsf.py. For more information about structWSF usage. See, openstructs.org/structwsf.
structwsf supports a variety of data formats (RDF, XML, JSON) and converts the data to and from RDF. bkn_wsf is tailored to use JSON data with structwsf, and in particular a format structwsf calls irJSON. This irJSON format is what you typically think of when you work with JSON data. Some structwsf responses are in a JSON structure that would be more familiar to someone working with RDF. BKN projects usually refer to the irJSON format as bibJSON.
bkn_wsf.py simplifies use of structwsf and handles conversions required to use structwsf with irJSON (bibJSON). BKN projects use bkn_wsf to import large datasets into structwsf instances/repositories. bkn_wsf can also be used as a proxy web service by javascript apps (See method BKNWSF.web_proxy_services). Every structwsf instance used by BKN has a corresponding Drupal site with a module called conStruct that integrates structwsf with Drupal. The Drupal site is used for various management tasks, and can be used to search and browse data. http://people.bibkn.org is the first structwsf instance created for BKN. There are now over one million records in the repository. BKN does not make use of most Drupal/conStruct functionality which includes the ability to skin display of different types of data.
INSTALLATION
bkn_wsf.py is a single python file. The source control repository is currently at,
github.com/bkn/bkn_wsf
From there you can browse the source and download code by clicking the Download Source button at the top right which pops up the download url,
github.com/bkn/bkn_wsf/tarball/master
There may be other files but bkn_wsf.py is the only relevant file. test.py and in.json may be useful for testing.
The primary location for the code may change. All BKN projects have a link to source code from a google code project called bibkn, code.google.com/p/bibkn/w/list. A url to the primary version of bkn_wsf should always be on the corresponding wiki page, code.google.com/p/bibkn/wiki/bkn_wsf
bkn_wsf.py is not useful without a structwsf repository, and write access requires you have a user account with the structwsf repository. Ask your system administrator to create an account.
Examples in this document refer to a BKN repository at datasets.bibsoup.org. This is a real repository but it is used in this document only as an example. This document assumes you have access to your own repository. However, if you are working with bibliographic data you can request an account by emailing info@bibkn.org.
Permissions
To use a dataset created by someone else they need to give you permission. On the Drupal site, depending on your permissions role you may be able to 'join' the dataset. Go to "Datasets" page, and click on the "join" link which should go to a url like, datasets.bibsoup.org/og/. In any case, contact the dataset creator to set your permissions. See DATASET CREATE section for instructions on how to set permissions for a dataset you create.
To confirm setup you can run the autotest method with the specified repository root url. For example,
The test performs reads and writes. The test prints out lines for each test. The final step displays the content of a dataset that gets removed after the test. The last output should be a pass/fail statement. The output should show a dataset named 'dataset_test', with a record that has an '1' for the 'id' value and 'update' for the 'name' value. If the test fails after creating the dataset the test should clean up by removing the dataset.
The method Test.wsf_test() is a hodgepodge of commented out example calls. This is sometimes useful for testing specific operations, or cleaning up test datasets. The file test.py calls this method.
USAGE OVERVIEW
bkn_wsf.py is intended to make it easy to use bibjson/irjson data. There is a performance penalty for the easy of use. Many structwsf services do not return irjson data. So bkn_wsf requests 'text/xml' responses which are then sent to the structwsf service '/converter/irjson/'.
bkn_wsf wraps structwsf services but any service with any parameters can be called from bkn_wsf using the BKNWSF.structwsf_request method which is how all bkn_wsf methods interact with structwsf. For instance, a search sets parameters and executes,
structwsf authentication is handled by associating an ip address with auser account on the drupal site associated with the structwsf repository. structwsf operations that require permissions take a parameter with an ip address of the user. The BKNWSF.structwsf_request method automatically detects the user's ip address and adds the 'registered_ip=' parameter to the structwsf service call. (The ip address is detected by calling bkn_wsf.py as a web service This is necessary for cases like client-side Javascript apps because structwsf needs the externally visible ip address which may not be what the browser detects on the client side.)
Most BKN datasets have public read-only permissions. Login is typically required for write access to a dataset, and to read private datasets.
Classes
There are three primary classes: BKNWSF, Dataset, and Record.
BKNWSF - manages the structwsf instance root uri, and includes most methods for operations which are not specific to a dataset (search, browse). Among the methods is structwsf_request which is the method that handles all calls to structwsf services.
Dataset - manages the dataset uri and it's related parts (root, id), and includes methods for operations on a single dataset like create, read, update, access, list, ...
Record - manages the record uri and id, and includes methods for operations on a record like create, update, read ...
Source code documentation is available in HTML format at services.bibsoup.org/doc/bkn_wsf/
Error Handling
Most bkn_wsf.py errors are not caught. You will see the python error on stdout. Some logging functionality is implemented. More will be done in the near-future. Errors from structwsf are propagated and returned as a response to calls, and the BKNWSF.structwsf_request method catches errors. Two common errors are,
403 FORBIDDEN - Yhe user doesn't have permission to perform the operation. When a dataset is created using bkn_wsf.py, default permissions are set for creator read/write. structwsf has very rich access control settings. bkn_wsf.py just sets defaults, and currently doesn't provide any special methods to facilitate adding users to a dataset. Permissions management is currently performed using the Drupal interface. More access control functionality is on the short-list for bkn_wsf.py.
400 BAD REQUEST - This is kind of a catch all. It could be because the structwsf call is not properly formatted or a parameter value is incorrect. For instance, reading a record id that does not exist will return this error, or not setting the structwsf root.
SETUP
This tutorial is a walk through based on the Test.autotest method in bkn_wsf.py. Before attempting any operation, the structwsf instance (repository) root must be set. bkn_wsf uses the root to construct roots for structwsf service calls and for datasets. For example,
BKNWSF.set(root,'root')
Service.set(BKNWSF.get()+'ws/','root')
Dataset.set(BKNWSF.get()+'datasets/','root')
structwsf calls typically require a dataset uri and/or a record uri. bkn_wsf will use the current Dataset root to construct a uri given an id and vice versa, and handle a uri that is preceded with one or more '@' symbols (which are often the format returned by structwsf responses). In each of the cases below, the Dataset uri and id are set and the method returns the dataset uri.
Dataset.set('http://datasets.bibsoup.org/wsf/datasets/dataset2')
Dataset.set('@@http://datasets.bibsoup.org/wsf/datasets/dataset2')
Dataset.get has a few uses. By default Dataset.get() returns the uri. A key can be passed to return parts of the uri, 'id' and 'root'. There are two other special keys. The 'record_data' key is described in the "dataset record list" section below. The 'template' key is used to get a bibjson/irjson formated template. The template is used with Record operations.
template = Dataset.get('template')
A dataset template looks like,
"id": "",
"schema": [
"http://downloads.bibsoup.org/datasets/bibjson/identifiers.json",
"type_hints.json",
"http://downloads.bibsoup.org/datasets/bibjson/bibjson_schema.json",
"http://www.bibkn.org/drupal/bibjson/bibjson_schema.json"
],
"linkage": ["http://www.bibkn.org/drupal/bibjson/iron_linkage.json"]
}
SEARCH AND BROWSE
The search and browse methods are in the BKNWSF class because they can be called to return data from multiple datasets. So there is not necessarily a specific active dataset when a browse or search call returns. (An alternative location for these methods could be the Dataset class which is now intended for operations on a single dataset.) While search/browse can take a list of datasets, they will use the active dataset if no dataset uri is specified. To specific all datasets, use 'all' as the uri parameter. There is functionality in structwsf to filter by type and attribute. Filter functionality is not yet wrapped by bkn_wsf but you can pass additional structwsf parameters to most bkn_wsf methods.
Search and Browse take parameters for page size and page number. To page through results set whatever page size you want and increment the page number starting at 0.
Below are some example calls and distinct sections of an example responses.
BKNWSF.search('pitman','all',25, 1) # the second page of results
dataset_uri = 'http://people.bibkn.org/wsf/datasets/mgp_id/'
BKNWSF.search('pitman', dataset_uri)
Dataset.set(dataset_uri)
BKNWSF.search('pitman')
BKNWSF.browse()
BKNWSF.browse('demo',20, 0)
The record data is in the 'recordList' object.
{
"status": "test",
"target": "8/31/10",
"some_id": "a17469",
"action": "go",
"type": "Person",
"id": "http://datasets.bibsoup.org/wsf/datasets/demo/d1",
"name": "Jack Alves"
},
{
"type": "Object",
"id": "http://datasets.bibsoup.org/wsf/datasets/demo/d2",
"name": "Jim Pitman",
"title": "Open Data Evangelist"
}
]
The response also returns metadata for each type or property. This metadata can be used for faceted displays. The 'aggregate' object contains the metadata.
{
"count": "1",
"property": {
"ref": "@@http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
},
"type": "Aggregate",
"id": "http://datasets.bibsoup.org/wsf/ws/browse/aggregate/f3a5a47d1d93f7139a474ee283e00236/dbd0b1bcc44f872d4ccdd4f9f0133a63/",
"object": {
"ref": "@@http://datasets.bibsoup.org/wsf/ontology/types/Object"
}
},
{
"count": "1",
"property": {
"ref": "@@http://www.w3.org/1999/02/22-rdf-syntax-ns#type"
},
"type": "Aggregate",
"id": "http://datasets.bibsoup.org/wsf/ws/browse/aggregate/f3a5a47d1d93f7139a474ee283e00236/dc26c336cc7640eccf2dba04e6b27ecd/",
"object": {
"ref": "@@http://datasets.bibsoup.org/wsf/ontology/types/Person"
}
}
The dataset section shows mapping for types and attributes.
"linkage": [
{
"typeList": {
"Aggregate": {
"mapTo": "http://purl.org/ontology/aggregate#Aggregate"
},
"Person": {
"mapTo": "http://datasets.bibsoup.org/wsf/ontology/types/Person"
},
"Object": {
"mapTo": "http://datasets.bibsoup.org/wsf/ontology/types/Object"
}
},
"linkedType": "application/rdf+xml",
"attributeList": {
"status": {
"mapTo": "http://datasets.bibsoup.org/wsf/ontology/properties/status"
},
"count": {
"mapTo": "http://purl.org/ontology/aggregate#count"
},
...
"object": {
"mapTo": "http://purl.org/ontology/aggregate#object"
},
"isPartOf": {
"mapTo": "http://purl.org/dc/terms/isPartOf"
},
"property": {
"mapTo": "http://purl.org/ontology/aggregate#property"
},
"name": {
"mapTo": "http://datasets.bibsoup.org/wsf/ontology/properties/name"
}
}
}
]
}
DATASET CREATE
To create a dataset execute,
if (not response): response = Dataset.read('my_dataset')
Dataset.set('my_second_set')
Dataset.create()
If the create succeeds the response is empty. The structwsf /dataset/create/ call gives the creator full permissions by default, and creates an access permission record for public access (0.0.0.0) but gives no public permission. The bkn_wsf Dataset.create method adds full permissions for the drupal site associated with the repository (this is currently hard-coded for BKN servers).
DATASET PERMISSIONS
structwsf implements permissions for Create Read Update Delete (CRUD) on any service. bkn_wsf sets permissions on a fixed set of services including:
'crud/read/'
'crud/update/'
'crud/delete/'
'search/'
'browse/'
'dataset/read/'
'dataset/delete/'
'dataset/create/'
'dataset/update/'
'converter/irjson/'
'sparql/'
And may explicitly add the following in the future,
'auth/registrar/access/'
'auth/lister/'
'auth/validator/'
'import/'
'export/'
'ontology/create/'
To set access for a dataset use,
response = Dataset.set(Dataset.get(), 'public_access')
The 'default_access' option adds full permission for the drupal server associated with the repository. Both of the above calls use the lower level method,
A wrapper to 'update' permissions is not yet implemented. For a Drupal repository like people.bibkn.org, permissions can also be set with a web interface. When a new dataset is imported you must first make it visible to conStruct, the module that manages structWSF transactions on a Drupal system. You need to "link" the dataset to the conStruct instance from the page,
On that page,
Set "WSF Address" to your domain (Ex. people.bibkn.org)
Set "Existing Dataset Uri" to your dataset uri (use the full uri which may have a trailing forwardslash '/', (Ex. http://www.people.bibkn.org/wsf/datasets/sandbox/)
Now, while logged in click on the "Dataset" link in the top right panel of a page. The datasets you have permissions to work with should be listed on that page. Datasets you have permission to manage should have a link to the right of the dataset that reads, "Manage Permissions". On the Manage Permissions page you can check boxes to set permissions for each user/ip address.
To use a dataset created by someone else they need to give you permission. On the Drupal site, depending on your permissions role you may be able to 'join' the dataset. Go to "Datasets" page, and click on the "join" link which should go to a url like, "http://datasets.bibsoup.org/og/". Then contact the dataset creator to set your permissions.
DATASET READ
To confirm creation of your dataset you can use Dataset.read which returns information about a dataset. Dataset.read is one of the methods called by Dataset.list() after
fetching a list of dataset ids, and uses similar parameters.response = Dataset.read(ds_uri,'description')
response = Dataset.read(ds_uri,'access')
response = Dataset.read(ds_uri,'access_detail')
The 'description' option gives a response like,
"recordList": [],
"dataset": {
"contributor": {
"ref": "@@http://datasets.bibsoup.org/user/4/"
},
"description": "This is a public area for testing This dataset may be deleted at any time.",
"created": "2010-08-19",
"id": "http://datasets.bibsoup.org/wsf/datasets/sandbox/",
"title": "Public Sandbox"
}
}
'read_update' - read and update are True, create and delete are False
'restricted' - all access False (even though there is a permission record)
'full' - all access True
"description": "This is a public area for testing This dataset may be deleted at any time.",
"created": "2010-08-19",
"title": "Public Sandbox",
"access": {
"184.73.164.129": {
"registeredIP": "184.73.164.129",
"read": "True",
"create": "True",
"concise": "full",
"update": "True",
"type": "Access",
"id": "http://datasets.bibsoup.org/wsf/access/03c4db7dcb905a24cf0c7dd969e9988e",
"delete": "True"
},
"0.0.0.0": {
"registeredIP": "0.0.0.0",
"read": "False",
"create": "False",
"concise": "restricted",
"update": "False",
"type": "Access",
"id": "http://datasets.bibsoup.org/wsf/access/c252265a5b60d426d8e4ac3a0d6e4d66",
"delete": "False"
},
"66.92.4.19": {
"registeredIP": "66.92.4.19",
"read": "True",
"create": "True",
"concise": "full",
"update": "True",
"type": "Access",
"id": "http://datasets.bibsoup.org/wsf/access/baa84a3a94117bef577ccda9071b3bb7",
"delete": "True"
}
},
"contributor": {
"ref": "@@http://datasets.bibsoup.org/user/4/"
},
"id": "http://datasets.bibsoup.org/wsf/datasets/sandbox/"
}
The 'access' call trims some structwsf information. 'access_detail' shows the full result which includes a list of services in each ip record.
"0.0.0.0": {
"registeredIP": "0.0.0.0",
"read": "False",
"create": "False",
"concise": "restricted",
"update": "False",
"webServiceAccess": [
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/sparql/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/converter/bibtex/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/converter/tsv/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/converter/irjson/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/search/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/browse/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/auth/registrar/ws/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/auth/registrar/access/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/dataset/create/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/dataset/read/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/dataset/update/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/dataset/delete/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/crud/create/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/crud/read/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/crud/update/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/crud/delete/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/ontology/create/"},
{"ref": "@@http://datasets.bibsoup.org/wsf/ws/auth/lister/"}
],
"type": "Access",
"id": "http://datasets.bibsoup.org/wsf/access/38661cf8b4e92a65c3c051058da278dc",
"delete": "False"
}
DATASET DELETE
To delete a dataset execute,
LIST DATASETS
The Dataset.list method returns a list of ids, permissions, or details about each dataset. The default is to return 'description'. Only datasets you have permission to read will be in the list.
response = Dataset.list('description') # list with id, title, description, date created
response = Dataset.list() # same as 'description
response = Dataset.list('access') # list with 'description' and access permissions
A call with 'ids' returns a simple list of dataset identifiers in irjson format. The 'recordList' shows all datasets you have access to including a few that are used by structwsf itself.
"type": "Bag",
"id": "",
"li": [
{
"ref": "@@http://datasets.bibsoup.org/wsf/datasets/"
},
{
"ref": "@@http://datasets.bibsoup.org/wsf/datasets/sandbox/"
},
{
"ref": "@@http://datasets.bibsoup.org/wsf/datasets/demo/"
},
{
"ref": "@@http://datasets.bibsoup.org/wsf/"
},
{
"ref": "@@http://datasets.bibsoup.org/wsf/ontologies/"
}]
The default call is 'description' which returns a list with id, title, description, and date created. To see permission settings for datasets use 'access' or 'access_detail'. bkn_wsf inserts the 'access' object which includes structwsf attributes and adds 'concise' to provide a simple access value. See the DATASET READ section for more information about 'description', 'access', and 'access_detail'.
DATASET RECORDS LIST
The Dataset.get method takes 'record_data' as a key to request a list of records in a dataset, and return all attributes with values for each record.
Dataset.get('record_data') is called by most operations that return a list of records (Search, Browse) because the default for structwsf calls when returning lists is to return only ids.
RECORD CREATE, UPDATE, READ, DELETE
structwsf services for creating a record and updating a record are very similar. In bkn_wsf Record.add is used to create, and Record.update is used for modifications. However, the structwsf services for creating a record can also modify a record.
If Record.add is called with a record id that already exists the new record data adds to the existing data. If you send record data that contains an attribute in the existing record, and the value of the attribute is different, then that value is added to the attribute (the value becomes an array of values). (All structwsf data is ultimately stored as RDF triples. In the case described, a new triple is written.)
Record.update will rewrite the record. It is as if the existing record is deleted and created again with the new data. Any attributes or values in the existing record that are not in the new data are lost.
structwsf requires the format of a complete dataset to execute these record operations. In particular, an object with 'dataset' and 'recordList' objects. bkn_wsf allows passing a simple record, and creates the required object using the default dataset template.
In the case of Record.update a record 'id' attribute is required in the record object. With Record.add if the id missing or empty then an id should be generated by structwsf. Both of these methods will use the active dataset uri if non is specified.
Record.set(record_id)
bibjson = {"name": "add","id": record_id}
response = Record.add(bibjson)
bibjson["name"] = "update it"
response = Record.update(bibjson)
The add and update service response is empty if the operation succeeds. To confirm the data is what you sent you can read it,
response = Record.read()
To delete a record you must explicitly specify the dataset uri and record uri. bkn_wsy does not use the active dataset and record if none are specified. Of course, you can pass in the current dataset and record.
BULK CREATE AND IMPORT
With the operations described above you can create your own operations to import data. bkn_wsf.py has a couple of example functions, data_import, and create_and_import.
data_import takes the name of a dataset, the name of a properly formatted bibjson file and imports the data in chunks as specified. There are also parameters to indicate what record to start with in the json file and how may records to import. The 'start' and 'testlimit' parameters are useful to test a few records before executing the full import. In some cases, you may want to just delete and recreate the dataset.
testlimit = 0
data_import('mass_import_test3', 'my_dataset.json', testlimit, import_interval)
create_and_import is does exactly that. It performs Dataset.create() then executes data_import.