Search

From TechWiki

Jump to: navigation, search
Search endpoint version:
1.1
2

The Search Web service is used to perform full text searches on the structured data indexed on a structWSF instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters. Each search query can be applied to all, or a subset of, datasets accessible by the requester. All of the full text queries comply with the Lucene querying syntax.

Each Search query can be filtered by these different filtering criteria:

  1. Type of the record(s) being requested
  2. Dataset where the record(s) got indexed
  3. Presence of an attribute describing the record(s)
  4. A specific value, for a specific attribute describing the record(s)
  5. A distance from a lat/long coordinate (for geo-enabled structWSF instance)
  6. A range of lat/long coordinates (for geo-enabled structWSF instance)

Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.

Contents

Version

This documentation page is used for the version 2 of this endpoint. Check at the top of this page to see the documentation pages for the other versions of this endpoint.

Usage

This Web service is intended to be used to perform full text searches, and filtered searches, on all the datasets hosted on a structWSF instance.

Web Service Endpoint Information

This section describes all the permissions you need in the WSF (Web Service Framework) to send a query to this Web service endpoint, and it describes how to access it.

To access this Web service endpoint you need the proper CRUD (Create, Read, Update and Delete) permissions on a specific graph (dataset) of the WSF. Without the proper permissions on this graph you won't be able to send any queries to the endpoint.

Needed registered CRUD permission:
  • Create: False
  • Read: True
  • Update: False
  • Delete: False

As shown on the graph URI:

  • URIs of the datasets to be queried

Here is the information needed to communicate with this Web service's endpoint. Descriptions of the parameters are included below.

Note: if a parameter has a default value, the requester can omit it and the default value will be used. Also, some baseline Web services may not offer other values than the default.

HTTP method:
  • POST

Possible "Accept:" HTTP header field value:

  • text/xml (structXML)
  • application/json (structJSON)
  • application/rdf+xml (RDF+XML)
  • application/rdf+n3 (N3/Turtle)
  • application/iron+json (irJSON)
  • application/iron+csv (commON)

URI:

  • http://[...]/ws/search/ ?query=param1&types=param2&datasets=param3&attributes=param4&attributes_boolean_operator=param5&include_attributes_list=param6&items=param7&page=param8&inference=param9&include_aggregates=param10&aggregate_attributes=param11&aggregate_attributes_object_type=param12&aggregate_attributes_object_nb=param13&distance_filter=param14&range_filter=param15&registered_ip=param16&interface=param17&lang=param18&sort=param19&results_location_aggregator=param20&extended_filters=param21&types_boost=param22&datasets_boost=param23&attributes_boost=param24&spellcheck=param25

URI dynamic parameters description:

Note: All parameters have to be URL-encoded

  • param1. Full text query. This query should comply with the Lucene Querying Syntax.
  • param2 (default: all). List of types of the records to be searched. Each type is separated by the ";" character. an example of such a list is: "type-a;type-b;type-c" meaning: I want to search for all the records with these types .
  • param3 (default: all). List of dataset URIs to be searched. Each dataset URI is separated by the ";".
  • param4.' (default: all'). List of filtering attributes (property) of (encoded) URIs separated by ";". Additionally, the URI can end with a (un-encoded) double-colon "::". What follows this double colons is a possible value restriction to be applied as a filter to this attribute to perform attribute/value filtered searches. The query syntax can be used for that filtering value. The value also has to be encoded. An example of this "attribute" parameter is: "http%3A%2F%2Fsome-attribute-uri::some%2Bfiltering%2Bvalue". There is a special markup used with the prefLabel attribute when the attribute/value filtering is used in this parameter. It is the double stars "**" that introduces an auto-completion behavior on the prefLabel core attribute. It should be used like: "attributes=prefLabel::te**"; this will tells the search endpoint that the requester is performing an auto-completion task. That way, the endpoint will ensure that the autocompletion task can be performed for more than one word, including spaces. If the target attribute is defined in the ontology with the xsd:dateTime datatype in its range, then date queries can be used in this filter. If a single date is specified, such as 2001-05-24, then all the records from that date until now will be returned by the query. If a range of date is specified such as [1999 to 2010], then all the records between these two dates will be returned. A range of dates has to be between double brackets. Also, the seperator of the two dates has to be " to " (space, the word "to" and another space). The format of a date description is about any English textual datetime description. If the target attribute is defined in the ontology with the xsd:int or the xsd:float datatype in its range, then numeric queries can be used in this filter. If a single number is specified, such as 235, then all the records with that attribute/value will be returned. If a range of numbers is specified such as [235 to 900], then all the records between these two numbers will be returned. A range of numbers has to be between double brackets. Also, the seperator of the two dates has to be " to " (space, the word "to" and another space). When a range is defined for an attribute/value filter, the star character (*) can be used to denote "any" (so, any number, any date, etc) like [235 to *].
  • param5. (default: and). Tells the endpoint what boolean operator to use ("or" or "and") when doing attribute/value filtering. One of:
    • "or": Use the OR boolean operator between all attribute/value filters. This means that if the user filter with 3 attributes, then the returned records will be described using one of these three.
    • "and": Use the AND boolean operator between all attribute/value filters. this means that if the user filter with 3 attributes, then the returned records will be described using all the three. This parameter affects all the attribute/value filters.
  • param6. (optional) A list of attribute URIs to include into the resultset. Sometime, you may be dealing with datasets where the description of the entities are composed of thousands of attributes/values. Since the Search web service endpoint returns the complete entities descriptions in its resultsets, this parameter enables you to restrict the attribute/values you want included in the resultset which considerably reduce the size of the resultset to transmit and manipulate. Multiple attribute URIs can be added to this parameter by splitting them with ";". If "none" is specified for this parameter, only the "uri" and the "type" of the results will be returned. If one or more property URI(s) is specified for this parameter, then these properties, the "uri", the dataset provenance and the "type" will be returned for this search query.
  • param7. (default: 10)). The number of items to return in a single resultset
  • param8. (default: 0). The offset of the resultset to return. By example, to get the item 90 to 100, this parameter should be set to 90.
  • param9. (default: on). One of:
    • "on": Inference is enabled
    • "off": Inference is disabled
  • param10.(default: false) One of:
    • "true": Aggregation data included in the resultset
    • "false": Aggregation data not included in the resultset
  • param11. Specify a set of attributes URI for which we want their aggregated values. The URIs should be url-encoded. Each attribute for which we want the aggregated values should be separated by a semi-colon ";". This is used to get a list of values, and their counts for a given attribute.
  • param12. (default: literal). Determines what kind of object value you are want the search endpoint to return as aggregate values for the list of attributes for which you want their possible values. This list of attributes is determined by the aggregate_attributes parameter.
    • "literal": The aggregated value returned by the endpoint is a literal. If the value is a URI (a reference to some record), then the literal value will be the preferred label of that referred record.
    • "uri": If the value of the attribute(s) is a URI (a reference to some record) then that URI will be returned as the aggregated value.
    • "uriliteral": If the value of the attribute(s) is a URI (a reference to some record) then that URI and its preferred label will be returned as the aggregated value.
  • param13. (default: 10). Determines the number of value to aggregate for each aggregated_attributes for this query. If the value is -1, then it means that all possible values for the target aggregated_attributes have to be returned.
  • param14. The distance filter is a series of parameter that are used to filter records of the dataset according to the distance they are located from a given lat;long point. The values are separated by a semi-column ";". The format is as follow: lat;long;distance;distanceType. The distanceType can have two values 0 or 1: 0 means that the distance specified is in kilometers and 1 means that the distance specified is in miles. An example is: -98.45;10.4324;5;0, which means getting all the results that are at maximum 5 kilometers from the lat/long position.
  • param15. The range filter is a series of parameter that are used to filter records of the dataset according to a rectangle bounds they are located in given their lat;long position. The values are separated by a semi-column ";". The format is as follow: top-left-lat;top-left-long;bottom-right-lat;bottom-right-long. Returned results will be compromised in that region.
  • param16.Target IP address registered in the WSF.
  • param17. Source interface used for this web service query. The interface is a different way to process a query (different algorithms, different data management system, etc. The default interface is 'default'
  • param18. (default: en) Language of the records to be returned by the search endpoint. Only the textual information of the requested language will be returned to the user. If no textual information is available for a record, for a requested language, then only non-textual information will be returned about the record.
  • param19. Sorting criterias for this query. Sort can be used for "type", "dataset", "uri", "preflabel", "score" or any other url-encoded attribute URIs that are defined with a maximum cardinality of 1. Sorting fields needs to be followed by a space character and a direction "desc" or "asc". Multiple sorting criterias can be added by splitting them with ";". Here is an example of query using sort to sort by type: "type desc". Here is an example of sort that sort by type and dataset: "type desc; dataset asc". Here is an example of a sort that sort with a custom attribute: "http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefURL desc". By default the sorting order is "asc".
  • param20. Specify a lat/long location where all the results should be aggregated around. For example, if we have a set of results compromised within a region. If we don't want the results spread everywhere in that region, we have to specify a location for this parameter such that all results get aggregated around that specific location within the region. The value should be: "latitude,longitude". By example: "49.92545999127249,-97.14934608459475"
  • param21. Extended filters are used to define more complex search filtered searches. This parameter uses a more complex syntax which enable the grouping of filter criterias and the usage of the AND, OR and NOT boolean operators. The grouping is done with the parenthesis. Each filter is composed of a url-encoded attribute URI to use as filters, followed by a colomn and the value to filter with. The full lucene syntax can be used to define the value to filter. If all values are required, the "*" (star) operator should be used as the value. If the value of an attribute needs to be considered a URI, then the "[uri]" syntax should be added at the end of the attribute filter like: "http%3A%2F%2Fpurl.org%2Fontology%2Ffoo%23friend[uri]:http%3A%2F%2Fbar.com%2Fmy-friend-uri". That way, the value of that attribute filter will be handled as a URI. There are a series of core attributes that can be used without specifying their full URI: dataset, type, inferred_type, prefLabel, altLabel, lat, long, description, polygonCoordinates, polylineCoordinates and located in. The extended filters are not a replacement to the attributes, types and datasets filtering parameters, they are an extension of it. Subsequent filtering criterias can be defined in the extended filtering parameter. The resolution logic by the Search endpoint is: attributes AND datasets AND types AND extended-filters. An example of such an extended query is: (http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefLabel:cancer AND NOT (breast OR ovarian)) AND (http%3A%2F%2Fpurl.org%2Fontology%2Fnhccn%23useGroupSignificant[uri]: (http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23liver_cancer OR http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23cancers_by_histologic_type)) AND dataset:"file://localhost/data/ontologies/files/doha.owl". Note: both the URI and the value (all kind of values: literals and URIs) need to be URL encoded before being sent to the Search endpoint.
  • param22. Modifying the score of the results returned by the Search endpoint by boosting the results that have that type, and boosting it by the modifier weight that boost the overall scoring algorithm. The types URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded type's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of two boosted types: urlencode(type-uri-1)^30;urlencode(type-uri-2)^300
  • param23. Modifying the score of the results returned by the Search endpoint by boosting the results that belongs to that dataset, and boosting it by the modifier weight that boost the overall scoring algorithm. The datasets URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded dataset's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint.Here is an example of two boosted datasets: urlencode(dataset-uri-1)^30;urlencode(dataset-uri-2)^300
  • param24. Modifying the score of the results returned by the Search endpoint by boosting the results that have these attribute(s) or these attribute(s)/value(s), and boosting it by the modifier weight that boost the overall scoring algorithm. This parameter is used to boost the relevancy of the returned records if they are described with a particular attribute URI, or if they are described with a particular attribute URI and a particual value for that attribute. The attributes URI to boost are url-encoded and separated by semi-colomns. If a value is specified for this attribute, then it will be seperated with the attribute URI by two colomns "::" followed by the url-encoded value. Then the boosting factor is delemited with a "^" character at the end of the encoded attribute's URI, or the encoded value followed by the boosting factor. Boosting a attribute/value only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of a boosted attribute URI and another booster attribute URI with a particular value: urlencode(attribute-uri-1)^30;urlencode(attribute-uri-2)::urlencode(some values)^300
  • param25. Includes the spellchecking suggestions to the resultset in the case that the resultset is empty. The search endpoint will create a resultset with a single result. This result will be of type wsf:SpellSuggestion. The suggested query words will be returned with the property wsf:suggestion and the wsf:frequency and the collated search would be returned with the property wsf:collation. Suggested terms can be ordered based on their frequency.

Available Sources Interfaces

A source interface is a way to process a web service query. Different sources interfaces can be implemented for the same structWSF web service endpoint. Each interface will process the query differently, but all the queries to the web service endpoint will be the same, at the exception of the interface parameter. Each interface shares the same API (the one defined by the web service endpoint), but their processing may differ (like using different algorithms, using different data management systems, etc.)

This is a list of the core interfaces for this endpoint. Organizations that hosts a structWSF network could create their own interface and make it available to the users. However such private source interface won't be part of this list, but should be publicized by the organization.


Source Interface Name Description
default Default source interface for this structWSF web service endpoint. This interface implements the default behavior of this structWSF endpoint.

Example of Returned XML Document

This is an example of the XML document returned by this Web service endpoint for a given URI. This example returns a list of datasets accessible by a given user IP.

Query:
  • http://[...]/ws/search/parameters: query=rdf&types=all&datasets=http%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F283%2F%3Bhttp%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F160%2F&items=10&page=0&inference=on&include_aggregates=true&registered_ip=self%3A%3A1

"Accept:" HTTP header field value:

  • text/xml

Result:

  1. <?xml version="1.0" encoding="utf-8"?>
  2. <!DOCTYPE resultset PUBLIC "-//Structured Dynamics LLC//Search DTD 0.1//EN" "http://constructscs.com:8890/ws/dtd/search/search.dtd">
  3. <resultset>
  4.    <prefix entity="aggr" uri="http://purl.org/ontology/aggregate#"/>
  5.    <subject type="http://purl.org/ontology/swt#Ontology" uri="http://constructscs.com/conStruct/datasets/122/resource/mopy">
  6.       <predicate type="http://purl.org/dc/terms/isPartOf">
  7.          <object type="http://rdfs.org/ns/void#Dataset" uri="http://constructscs.com/wsf/datasets/122/"/>
  8.       </predicate>
  9.       <predicate type="http://usefulinc.com/ns/doap#name">
  10.          <object type="rdfs:Literal">mopy</object>
  11.       </predicate>
  12.       <predicate type="http://usefulinc.com/ns/doap#homepage">
  13.          <object type="rdfs:Literal">http://www.sourceforge.net/projects/motools</object>
  14.       </predicate>
  15.       <predicate type="http://usefulinc.com/ns/doap#programming-language">
  16.          <object type="rdfs:Literal">Python</object>
  17.       </predicate>
  18.       <predicate type="http://purl.org/ontology/swt#status">
  19.          <object type="rdfs:Literal">Existing
  20.          </object>
  21.       </predicate>
  22.       <subject type="aggr:Aggregate" uri="http://constructscs.com/wsf/ws/search/aggregate/8d4746ea554cfec324b0a740fbbc9be6/6ff6595d838e72f230b1b88974705166/">
  23.       <predicate type="aggr:property">
  24.          <object uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
  25.       </predicate>
  26.       <predicate type="aggr:object">
  27.          <object uri="http://purl.org/ontology/swt#SearchEngine"/>
  28.       </predicate>
  29.       <predicate type="aggr:count">
  30.          <object type="rdfs:Literal">5
  31.          </object>
  32.       </predicate>
  33.    </subject>
  34. </resultset>

HTTP Status Codes

Here are the possible HTTP status (error) codes returned by this Web service endpoint.

On error code and the specific error, a different message description can be issued (meaning a different error has been returned).

  • Code:200
    • Message: OK
  • Code:400
    • Message: Bad Request
    • Message description: The Search web service endpoint is not geo-enabled. Please modify your query such that it does not use any geo feature such as the distance_filter and the range_filter parameters.
    • Message description: No query specified for this request
    • Message description: The number of items returned per request has to be greater than 0 and lesser than 128
    • Message description: No dataset accessible by that user
    • Message description: No requester IP available
    • Message description: No Web service URI available
    • Message description: Target Web service XYZ not registered to this Web Services Framework
    • Message description: No access defined for this requester IP XYZ, dataset (XYZ) and Web service (XYZ)
    • Message description: The target Web service (XYZ) needs create access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
    • Message description: The target Web service (XYZ) needs read access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
    • Message description: The target Web service (XYZ) needs update access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
    • Message description: The target Web service (XYZ) needs delete access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
  • Code:406
    • Message: Not Acceptable
    • Message description: Unacceptable mime type requested
  • Code:500
    • Message:Internal Error
Personal tools