Fietstas REST web service reference
Fietstas is a text analysis system that provides a set of web services for processing textual content of your documents. Here are a few things you can do with Fietstas:
- You can upload your documents to Fietstas, assigning unique identifiers for future reference
- You can request term clouds for specific documents or sets of documents
- You can request lists of named entities (NEs) that appear in specific documents or sets of documents
- You can provide feedback on Fietstas's automatic annotations; your feedback will be incorporated in the processing results that you request from Fietstas later
- You can visualize the results of your requests with XSLT/Javascript provided by Fietstas, or you can use your own visualization/analysis tools
Fietstas performs most of the necessary document processing in the background and uses caching to minimize waiting time for your requests.
Service overview
Fietstas consists of several related web services:
- key service handles API keys: you will need at least one key to use Fietstas as a web service. A key gives you access to your own namespace withing Fietstas, that no one else can use.
- doc service deals with documents: here you upload your data, check what has been uploaded previously or modify your documents.
- tag service lets you list textual tags that you have assigned to your documents at upload.
- cloud service upon request creates term clouds for your documents
- annotation service allows you to obtain the content of specific documents automatically annotated by Fietstas.
- feedback service allows you provide feedback on annotations, which can be used by Fietstas to improve the quality of subsequent annotations.
Key Service
To be able to use all the other services of Fietstas, a user first needs to obtain a unique API key. The key service deals with the generation and validation of API keys. There are only two methods: to generate a new key and to validate an existing key.
Requesting a new API key
- Host: http://fietstas.science.uva.nl
- Request: POST /key
- Parameters:
- name: The name of the user requesting a key: any string that somehow identifies you or your project: this name will help you recover your key if you ever lose it; required parameter.
Response in case of success:
HTTP 201 Created
<?xml version="1.0" encoding="utf-8"?>
<user>
<name>Your name</name>
<key>Your new API key</key>
<created>Key creation time</created>
<updated>Key update time</updated>
<access_count>Number of times the key has been used</access_count>
<access_limit>Maximum allowed number of key uses per day</access_limit>
</user>
Responses in case of failure:
- HTTP 403: attempt to generate too many keys in one day for a single IP address
- HTTP 400: no name is specified
Validating an existing API key
- Host: http://fietstas.science.uva.nl
- Request: GET /key?key=Your-API-key
Response in case of success:
HTTP 200 OK
<?xml version="1.0" encoding="utf-8"?>
<user>
<name>Your name</name>
<key>Your API key</key>
<created>Key creation time</created>
<updated>Key last usage time</updated>
<access_count>Number of times the key has been used</access_count>
<access_limit>Maximum allowed number of key uses per day</access_limit>
</user>
Responses in case of failure:
- HTTP 404: key is not found
- HTTP 403: access denied
Document Service
Before you can use documents in a term cloud or ask Fietstas to annotate them, you need to upload the documents to Fietstas. On upload, the document service stores documents, prepares them for processing (if requested by the user), and schedules the appropriate processing jobs. The document service contains methods to upload and to query documents.
Uploading a document
- Host: http://fietstas.science.uva.nl
- Request: POST /doc
- Content type: either application/x-www-form-urlencoded or multipart/form-data
- Parameters:
- key: API key; required parameter
- id: give an optional identifier to the document. If provided, the identifier has to be unique within the "namespace" of the API key (unless replace is set, see below). By default, the system generates document id automatically.
- language: language of the document; dutch is the only valid value (default: dutch)
-
tags: comma-separated list of tags for this document (default: none)
- tags can be used to group documents into (sub)collections
- a tag name should not contain whitespace characters (space, tab, newline etc.) and '*'
- replace: whether to replace the document if a document with the same id already exists (within the API key). A value of 1 means the document's content and options will be replaced by those given in the request. A value of 0 means that an error is returned in case of conflicting ids. The default is 0.
- document: the content of the document itself; required parameter
- entities: 1 if named entities (persons, locations, organizations) should be extracted (default: 0)
- stems: 1 if the terms in the doc should be stemmed (default: 0)
- postags: 1 if parts of speech should be identified for the words in the document (default: 0)
- words: 1 if the document should be tokenized (default: 0)
Notes:
- Documents can be uploaded in two modes:
- via a usual application/x-www-form-urlencoded POST request where parameter document in the form provides the plain text of the document,
- or via a multipart/form-data POST request (e.g., a form with a file
upload); in this case only Content-Type text/plain is accepted for the
document
- see the W3C specification for details on form content types
- file uploading is usually built into the language that you use (eg. urllib2 for Python).
- You can replace documents only if you specify the document id in the request
Response in case of success:
HTTP 202 Accepted <?xml version="1.0" encoding="utf8"?> <docs xmlns="http://ilps.science.uva.nl/fietstas/doc"> <doc id="New document id"/> </docs>
Responses if failure:
- HTTP 403 Key invalid
- HTTP 400 Unsupported Content Type
- HTTP 409 Conflict (provided docid is not unique within the API key, and that the replace parameter is 0)
- HTTP 400 Ill-formed request (e.g., no document provided)
Listing uploaded documents
- Host: http://fietstas.science.uva.nl
- Request: GET /doc?parameters
- Parameters:
- key: your API key (required)
- tags: tag expression to filter documents on (optional).
A tag expression can be:- a single tag name, in which case only documents uploaded with this tag will be listed;
- a conjunction (a comma-separated list) of tags; e.g., for tags=one,two,three only the documents with all three tags will be listed;
- a disjunction (a vertical-bar-separated list) of tags; e.g., for tags=one|two|three the documents with at least one of the three tags will be listed;
- a formula with conjunctions and disjunctions, where "|" binds stronger than ","; e.g., tags=one,two|three the documents with tag one and either two of three (or both) will be listed;
Response if success:
HTTP 200 OK
<?xml version="1.0" encoding="utf-8"?>
<docs total_count="Total number of matching documents" xmlns="http://fietstas.science.uva.nl/doc">
<doc id="one document ID"/>
<doc id="another document ID"/>
etc...
</docs>
Responses if failure:
- HTTP 403 Invalid key
- HTTP 404 No documents found
Getting a document
Request:
- GET /doc/key1234/doc123/
Request parameters:
- text: 1 to return the text content of the document (default: 1)
- meta: 1 to return the metadata of document (default: 1)
Response if successful:
HTTP 200 Ok
<?xml version="1.0" encoding="utf8"?>
<docs xmlns="http://ilps.science.uva.nl/fietstas/doc">
<doc>
<docid>doc123</docid>
<text>......</text>
<meta>
<tags>
<tag>testtag</tag>
<tag>week1</tag>
<tag>test3</tag>
</tags>
</meta>
</doc>
</docs>
Responses if failure:
- HTTP 403 Key error
- HTTP 404 Document does not exist
Deleting documents
[not implemented]
Location: /doc HTTP method: DELETE Parameters:
- key
- docs - comma-separated list of doc ids to delete
- tag - single tag: docs with this tag will be deleted
Response:
- OK
Term annotation service
Requesting an annotation
Location:: /anno HTTP method: POST
Parameters:
- key
- docs
- tags
- annotations: comma-separated list of types of annotations to perform (XML tags will be inserted in this order)
- output: XML, XHTML or offsets (default XML; offsets not implemented)]
Response:
- XML with id and permalink to this annotation
Obtaining a previously requested annotation
Location:: /anno HTTP method: GET
Parameters:
- key
Term cloud Service
The term cloud service deals with the generation of term clouds. Generated term clouds are wrapped in a cloud list so that dynamic tag clouds can also be generated using this service. When a request for a cloud list is submitted, a unique ID is assigned to the request. This request ID can be used later to access the generated cloud(s). The service performs all necessary processing of the documents, even if it was not specified when posting the documents.
Location: /cloud
Generating a simple term cloud
Requests:
-
GET /cloud/reqid1234/ to refer to clouds requested previously
- this request does not take any parameters
- POST /cloud/
Request parameters:
- key: API key
- name: A human readable name for the cloud (default: empty)
- description: A human readable description for the cloud (default: empty)
- docs: comma-separated list of document ids (docids) that should be used for generating the cloud
-
tags: tags that should be used to select the document for cloud
generation; you can specify a single tag, a list of tags or a more complex expression:
- tags=X will select all document with tag X;
- tags=X,Y,Z will select all document that are tagged with all three tags X, Y and X ;
- tags=X|Y|Z will select all document that are tagged with any of the tags X, Y or Z ;
- tags=A|B,X|Y|Z will select all document that are tagged with A or B, and at the same time with any of the tags X, Y or Z (i.e., "|" binds stronger than "," in tag expressions);
- tags=* will select all documents uploaded with the current API key
-
link: template to generate external links for each term in a cloud.
(default: empty)
- The special sequence {{TERM}} in the template will be replaced by the actual term
- example: link=http://example.org/term.php?q={{TERM}}
-
score: the way the term score in the output is computed:
- frequency: the normalized frequency of terms (default)
-
ll: (not implemented) the normalized log-likelihood statistics. If this is
used, the following options should also be specified:
- corpus: background corpus to use: a list of document ids or a single tag (Ie. doc1,doc2 or tag1). If not specified (or a tag '*' is specified), all documents posted under the API key are used as the background corpus.
- lambda: The smoothing parameter (default: 0.05)
-
parsimonious: terms are scored according to the (normalized) parsimonious
language model of the document. The following options should be specified:
- corpus: document ids or tags, same format as when score=ll
- gamma: the parameter of the model, a number between 0 and 1 (default: 0.15)
-
normalization: whether the term scores in the result should be
normalized. The value has 3 fields separated by ',':
- min: the lowest score value in the result
- max: the highest score value in the result
- func: the normalization function: "lin'" for linear normalization and "log" for log-normalization
- for example: if normalization=1,20,log, then the term scores with be scaled with a log(.) function and the values renormalized between 1 and 20.
- if normalization is not specified, the value "8,30,log" is used.
- size: The number of terms in a cloud (default: 50, max: 100)
- min_count: Minimum count for terms; terms that occur less than min_count times will be ignored (default: 1)
-
select: how terms should be ordered before first size
terms are selected:
- score: terms will be ordered by score (default)
- entities: first, take all entities (ordered by score), then all other terms (ordered by score)
-
order: how terms will be ordered in the cloud
- dict: use the standard dictionary order: a < b < c < ... (default)
- entities: first, take all entities (dictionary-ordered), then all other terms (dictionary-ordered)
- words: 1 if words (i.e., non-names) should be included in the cloud (default 0)
- stems: 1 if words in the cloud should be stemmed (default 0; setting it to 1 implies words=1)
- stopwords: 1 if stopwords should be removed from the cloud (default 0)
- postags: comma-separated list of postags to filter terms (eg. =noun,verb,adjective= to only show nouns, verbs and adjectives) (default: use all POS tags)
- entities: 1 if named entities should be detected and normalized (default 0)
- variants: include term variants in the output (e.g., of named entities or stems)
-
output:
- xml to get xml formatted output (default)
- xhtml for xhtml format
- json for the json variant (not available yet)
-
xslt: id of the document to be used an the XSLT stylesheet to generate
XHTML output. This parameter is used only if output=xhtml. If
not specified, a default XSLT is used.
- Note: the XSLT should be uploaded to Fietstas as a normal document before this option can be used.
-
count:
- occurrences to count the number of occurrences of terms in documents (default)
- documents to count the number of documents in which the terms occur (not implemented)
-
filter: A list of comma separated words to use as a filter. The filter
can be used in two ways:
- Inclusive: In which only the terms listed are included in the cloud. Example: filter=dog,cat,fish, to list only these three words
- Exclusive: In which the listed terms are not used in the cloud. Example: filter=!dog,cat,fish, to exclude these three words from appearing in the cloud.
-
filter_document: (not implemented) id of the document containing terms
to be used for filtering. Like for filter option, if the value of
filter_document starts with '!', than the terms in the document with the
following id will be excluded from the result. If the value does not start with '!', only
terms in the document will be included in the result.
- Note: the document for filtering should be uploaded to Fietstas as a normal document before this option can be used.
Note:
- Either 'docs' or 'tags' needs to be present in the request; other parameters are optional
- You can not have a background corpus of just one document.
- An exclusive filter always starts with a '!'
Example request (with explicit list of documents):
POST /cloud key=key1234&name=Cloud&description=My+clouds&docs=doc1,doc2
Example request (using a tag to specify a collection of documents):
POST /cloud key=key1234&name=Cloud&description=My+clouds&tags=tag1
Generating a dynamic tag cloud
Requests:
-
GET /cloud/reqid1234/ to refer to clouds requested previosly
- this request does not take any parameters
- `POST /cloud/
Request parameters:
- The same as with a simple cloud, except that the parameters docs,
tags and link are replaced by:
-
labels: comma-separated list of labels for clouds (default: =cloud=,
i.e., a list of single cloud labelled 'cloud')
- For every cloud (label in the examples below) you need to
specify the following parameters:
- docs:label: comma-separated list of document ids (docids) that should be used for generating the cloud
- tags:label: tags that should be used to select the document for cloud generation
- name:label: a human readable name for the cloud (default: empty)
- description:label: a human readable description for the cloud (default: empty)
-
link:label: template to generate external links for each
term in a cloud.
- The special sequence {{TERM}} in the template will be replaced by the actual term
- example: link:label=http://example.org/term.php?q={{TERM}}
- For every cloud (label in the examples below) you need to
specify the following parameters:
-
labels: comma-separated list of labels for clouds (default: =cloud=,
i.e., a list of single cloud labelled 'cloud')
Note:
- Either 'docs:label' or 'tags:label' needs to be present for every cloud; other parameters are optional
- You need to specify at least one label
Example request (with explicit list of documents):
POST /cloud key=key1234&name=Cloud&description=My+clouds&labels=cloud1,cloud2&docs:cloud1=doc1,doc2&docs:cloud2=doc3,doc4&name:cloud2=Example
Example request (using a tag to specify a collection of documents):
POST /cloud key=key1234&name=Cloud&description=My+clouds&labels=cloud1,cloud2&tags:cloud1=tag1&tags:cloud2=tag2&name:cloud2=vsdfv&description:cloud2=blaat
Responses (in both cases)
Response if valid request, but nothing processed yet:
HTTP 202 Accepted ... <?xml version="1.0" encoding="utf8"?> <request xmlns="http://ilps.science.uva.nl/fietstas/request"> <id>req123</id> <status>wait</status> <message>5</message> </request>
Response if valid request and only a partial cloud is available:
HTTP 206 Partial cloud
...
<?xml version="1.0" encoding="utf8"?>
<request xmlns="http://ilps.science.uva.nl/fietstas/request">
<id>req123</id>
<status>partial</status>
<message>30%</message>
<cloud-list xmlns="http://ilps.science.uva.nl/fietstas/cloud">
<cloud>
...
</cloud>
</cloud-list>
</request>
Response if valid request and cloud is ready:
HTTP 200 OK
...
<?xml version="1.0" encoding="utf8"?>
<request xmlns="http://ilps.science.uva.nl/fietstas/request">
<id>req123</id>
<status>completed</status>
<message>100%</message>
<cloud-list xmlns="http://ilps.science.uva.nl/fietstas/cloud">
<cloud>
...
</cloud>
</cloud-list>
</request>
Along with the output, Fietstas also sends several HTTP headers to convey status information about the cloud request. These headers are the following:
- X-Fietstas-Request-Id (the request id)
- X-Fietstas-Request-Status (the request status)
- X-Fietstas-Request-Message (the request message)
If the request failed, the following responses may be given:
- HTTP 403 Invalid key
- HTTP 404 No documents found
- HTTP 404 Requestid not found
For the xml definition of a term cloud, see this page. The contents of the response depend on the value of status. Status can have the following values:
- wait (Term cloud is being generated; However, intermediate results are not yet available)
- partial (Term cloud is being generated, but there are intermediate results)
- completed (Term cloud generation is completed)
The message field is dependend on the status field as follows:
- waiting period (in seconds), if status is wait
- percentage completed if status is partial
If the status is either partial or completed, then the cloud-list element is present. Otherwise, it is left out.
