Alterra.ai Phraser API

Phraser is a semantic intent classifier for natural language questions and commands.

Using this API you may build your own conversational virtual agents, messenger bots, Alexa skills and Actions on Google. You may voice-enable mobile apps and IoT devices. You may build a fully autonomous solution or pair it up with humans.

Phraser is a query classifier. Given a set of queries and a set of classes (intents) it will determine to which class a particular query belongs to.

Users may ask the same question or make the same command in a multitude of semantically equivalent ways – Phraser will reduce these paraphrases to one canonical form (assign to a pre-defined class). Your program may then reply with text or call any other function.

Intent is a group of semantically similar natural language queries. E.g., queries like “How can I call you?”, “What is your phone number?”, “I need a contact number to call”, etc. belong to one “your phone number” intent.

You define the intents as you think fit.

Powered by Deep Learning algorithms, the system requires a training corpus of historic user queries with the correct intents assigned to them. The bigger the corpus, the higher the classification quality. You upload your training corpus to the system via this API, too.

The system keeps a log of queries it receives. Incoming queries may be assigned correct intents by humans and added to the training corpus.

At the core of Phraser is phrase2vec sequence embedding algorithm our company has come up with. It’s like word2vec, but for multiple-word questions and commands. It maps short phrases to vectors in a 300-dimensional space, so as semantically similar phrases cluster together. These vectors are also available via this API.

Cosine similarity between vectors defines how semantically close the underlying phrases are. Unlike edit (Levenshtein) distance, this distance doesn’t count words. E.g. phrases like “How can I call you?” and “What is your phone number?” will be very close to each other, despite the fact that edit distance between them is infinite (no common words at all).

The API has three main parts:

  1. Performing intent classification
  2. Defining intents
  3. Working with the training corpus and query log

The first part is used during the serving time. It performs actual classification. The user enters a query; you pass it to this API, and it returns the intent candidates, ordered by matching score.

The other two parts are used ahead of time, to upload and edit data, and train machine learning.

The API may return more than one result. If your application is a fully automated bot you may display only the first result. You may also display several results. If you have a human in the loop you may display several results to the agent and let him manually select which one to send to the end-user.

All APIs defined here follow the REST paradigm.

All methods require an API key. API key is passed as “Authorization” header in the request. You automatically receive your API key when you self-register for the service on Alterra’s website.

All GET methods take arguments as CGI parameters in the URL.

All POST and PUT requests take arguments in the request body, which should be in JSON format.

All methods return JSON. It may be empty if the only result is an operation status which is reported as HTTP status code.

The end-point for this API is at http://next.alterra.ai/api/phraser/v1/

Entities

Each entity is represented by a JSON object.

Intent

Intent is a group (class) of semantically similar natural language queries. E.g., queries like “How can I call you?”, “What is your phone number?”, “I need a contact number to call”, etc. belong to one “your phone number” intent.

You define the intents as you think fit.

Field name Type Always present Description
id int Y
name string Y Name of the intent
comment string N Private comment, not visible to end-users
do_not_index bool N Disable indexing of the intent

Example:

{
    "id": 42,
    "name": "send_email",
}

Intent objects are used in Intents API.

Query

Users ask questions or make commands in their own words. Many of these questions (queries) are logically equivalent and shall be attributed to the same intent. These paraphrases (referred to as “queries”) are stored in the system and used for training AI.

These queries may be imported from the log of actual queries entered by actual users (see Log Entry), or from other sources (e.g. public training corpora). They may be even thought up by you – think of ways users may refer to a particular intent.

Field name Type Always present Description
text string Y Query’s text
hash string N A hash of this query’s text, used as its ID
intent_id int Y id of the intent the query belongs to

intent_id shall be trusted; it is assigned by a human labeler and used for training the algorithm. intent_id=0 can be used to store not-yet-labeled queries.

Example:

{
    "text": "Where's the money Lebowski?",
    "hash":  1138,
    "intent_id": 0
}

Query objects are used in Queries API.

Together, Intents and Queries form the training corpus for Machine Learning.

Log entry

An actual user query that a user entered into the system in the past.

All or some of these user queries may be exported to the system and become a part of the training corpus. Other user queries may be pre-filtered as irrelevant, garbage, spam, etc. and ignored. Thus, not all queries from the raw query log belong to the training corpus.

Field name Type Always present Description
query_text string Y User query text
search_id string Y The ID of the search assigned when the search was performed
query_hash string Y Normalized query text hash
timestamp datetime Y Date and time in ISO 8601 (RFC 3339) format
intent_ids list of ints N List of intent_ids returned by the search algorithm as the response to this query, in the order of relevancy (see Search Response)

Unlike intent ids in Queries, these intent ids cannot be trusted. They are assigned by the search algorithm and may contain errors.

Example:

{
    "query_text": "How do one discover new bots?",
    "query_hash": "F95C6BB30EC32F55",
    "search_id": "05F4F53B-4F5D8162-7852A351-4B90F22E",
    "timestamp": "2017-03-29T12:00:35Z",
    "intent_ids": [42, 17, 5]
}

Log entry objects are used in Logs API.

Search response

List of all results returned by the classifier in response to given query, ordered by relevancy score

Field name Type Always present Description
search_id string Y The ID of the search
query_hash string Y Normalized query text hash
timestamp datetime Y Date and time in ISO 8601 (RFC 3339) format
results list of search result objects Y List of search results, ordered by relevancy

Example:

{
  "search_id": "05F4F53B-4F5D8162-7852A351-4B90F22E",
  "query_hash": "F95C6BB30EC32F55",
  "timestamp": "2017-03-29T12:00:35Z",
  "results": [
    {
      "intent_id": 42,
      "title": "send_email"
    },
    {
      "intent_id": 17,
      "title": "get_contacts"
    },
    {
      "intent_id": 5,
      "title": "your_address",
    },
  ]
}

Search result

One candidate intent returned by the classifier.

Field name Type Always present Description
intent_id int Y Intent ID
title string Y Intent title

Example:

{
    "intent_id": 42,
    "title": "send_email"
}

Search response object is returned by the Search API.

Vector response

This object is the output of our phrase2vec sequence embedding algorithm. phrase2vec is like word2vec, but for multiple-word questions and commands. It maps short phrases to vectors in 300-dimensional space, so as semantically similar phrases cluster together. This object contains this vector, along with accompanying meta-information.

Field name Type Always present Description
model_version string Y Version of the model used to create the vector
vector list of numbers Y List of vector coordinates (currently, 300 numbers)
text string N Phrase (text) corresponding to the vector
text_hash string N Normalized text hash
timestamp datetime N Date and time in ISO 8601 (RFC 3339) format

Example:

{
  "model_version": "1",
  "text": "Give me a vector",
  "query_hash": "D95D4477D2BE0412",
  "timestamp": "2017-11-01T12:00:35Z",
  "vector": [
    0.1,
    -0.2,
    0.3
  ]
}

We may change the underlying Deep Learning model without notice. Thus, the model_version is always included in the response. Only vectors created with the same version of the model are comparable.

Vector response object is returned by the Vector API.

Pre-defined intents

There is a number of pre-defined intents (classes) to treat user queries that don’t belong to any of intents you defined. Pre-defined intents are editable and can be manually deleted from the corpus.

Title ID Indexed Description
Wrong Language -4 No The user query is detected to be non-English
Garbage -5 No Completely useless queries. Garbage. Trash. Spam. Not worth answering, ever.
Ignore -3 No Meaningful queries worth answering that however shall not be added to your corpus: off-topic, one-off, too ambiguous, too short, too long and complex, etc. You may forward them to humans.
To do -2 No Meaningful on-topic queries that doesn’t have an intent in the current corpus, but should. You may create new intents and then re-assign these queries to them. Thus, the “to-do” name.

Wrong Language: this API only supports English. The engine includes a language identification algorithm. When it detects a non-English query it returns id=-4.

Garbage, Ignore, To do: use these three classes for questions that don’t have intents in your corpus. (The table above describes the differences between the three.) If the user query is classified as one of these, you may display a “No results found” message to the user.

Since the queries that fall under these categories are quite different from legitimate ones, the classifier may be not as accurate on these queries, compared to good ones. Garbage in – garbage out.

Therefore, you have an option to deactivate some or all of these classes. You do it by setting ‘do_not_index = true’. (In fact, it is the default value.) If you do so, the ML system will not use the respective classes for training and will not reply with the “No results found” message. Instead, it will attempt to find a matching intent in the current corpus. Most likely, it will be incorrect. Garbage in – garbage out.

However, even if you decide to deactivate these classes, you should still use them when labeling the query log. Otherwise, if you try to label these “bad” queries as legitimate it will wreak havoc on the algorithms. Besides, you may later decide to activate these classes — your training corpus will be ready. You may always activate it by changing ‘do_not_index’ from ‘true’ to ‘false’.

Search API

This is the main part. It is used during the serving time. It performs actual classification. The user queries your application, you pass the query to this API, which returns the search results, i.e. the list of candidate intents, ordered by mathing score (relevancy).

Given a user query, find relevant intents: GET from /api/phraser/v1/search

Arguments

Field name Type Required Description
query string Y user query

Example: /api/phraser/v1/search?query=How+can+I+contact+you

Server reply

A search response object (list of search results, ordered by relevancy)

Example:

{
  "search_id": "73b61636-29b1-4bee-8845-3bc0ffe9a86a",
  "results":
    [
        {
            "intent_id": "42",
            "title": "send_email"
        },
        {
            "intent_id": "43",
            "title": "order_pizza"
        }
    ]
}

phrase2vec API

This API exposes our phrase2vec sequence embedding algorithm. It’s like word2vec, but for multiple-word questions and commands. It maps short phrases to vectors in a 300-dimensional space, so as semantically similar phrases cluster together. These vectors are available via this API.

This API takes a piece of text (phrase) as input and returns the respective vector.

phrase2vec is pre-trained on generic English phrases. The output does not depend on your trainig corpus or its intents.

We may change the underlying Deep Learning model without notice. However, the model_version is always returned in the response, so you will notice if the changes has occured. Only vectors created with the same version of the model are comparable.

Given a phrase (text), calculate the vector: GET from /api/phraser/v1/search

Arguments

Field name Type Required Description
text string Y user text

Example: /api/phraser/v1/vector?text=Hello+world

Server reply

A vector response object

Intents API

This part is used for uploading and editing the intents.

Each POST, PUT or DELETE endpoint may be used with an optional CGI parameter train=true to re-train ML models after API call succeeds. If you plan a run of consequent API calls updating the corpus, it is advised not to use train=true, but rather invoke train API call for re-training of the ML models after finishing editing the training corpus with this API.

Add new intents: POST to /api/phraser/v1/intents/

Upload new intent(s) to the system.

Arguments

Field name Type Always present Description
body list of intent objects Y new intents to upload

Example:

"body": [
    {
        "title": "send_email",
        "id": "42"
    },
    {
        "title": "order_pizza",
        "id": "54"
    }
]

Server reply

List of identifiers of just uploaded intents.

If the intents were uploaded with specific idsthen these ids are honored. If the intents were uploaded without ids then the system automatically assigns new ids to them. ids of all just uploaded intents are returned here.

Example:

[42,54]

Update existing intents: PUT to /api/phraser/v1/intents/

Edit (replace) existing intent(s).

Arguments

Field name Type Always present Description
body list of intent objects Y data for intents you are updating

Example:

"body": [
    {
        "title": "send_email",
        "id": "42"
    },
    {
        "title": "order_pizza",
        "id": "54"
    }
]

Server reply

List of ids for replaced intents (as a confirmation).

Example:

[42,54]

Delete one intent: DELETE /api/phraser/v1/intents/{intent_id}

Server reply

Appropriate HTTP status

Get a list of intents: GET from /api/phraser/v1/intents/

Export intents from the system. It could be used to retrieve all intents, or a portion of (with pagination).

Arguments

Field name Type Required Description
offset int N Offset in the sorted list of intents. May be negative (Then is applied from the end of the list) Default value is 0
limit int N Maximum number of intents to return. Default value is 10

Example: /api/phraser/v1/intents?offset=42&limit=2

Server reply

List of intent objects

Example:

[
    {
        "title": "send_email",
        "id": "42"
    },
    {
        "title": "order_pizza",
        "id": "43"
    }
]

Get one intent: GET from /api/phraser/v1/intents/{intent_id}

Export one specific intent, identified by intent_id, from the system.

Server reply

An intent object or an appropriate HTTP status and error message

Example:

{
    "title": "send_email",
    "id": "42"
}

Queries API

This part is used for working with the query log and training corpus for Machine Learning.

There are two types of queries:

  1. All queries entered by users in the past – see Log entries
  2. Queries admitted to the training corpus – see Queries

This API deals with the latter.

These two sets have a big overlap by may be not equal. Indeed, only legitimate user queries shall be added to the training corpus. Garbage and spam shall be discarded. On the other hand, some queries in the training corpus may come from sources other than logged user queries (e.g. other public training corpora).

Each POST, PUT or DELETE endpoint may be used with an optional CGI parameter train=true to re-train ML models after API call succeeds. If you plan a run of consequent API calls updating the corpus, it is advised not to use train=true, but rather invoke train API call for re-training of the ML models after finishing editing the training corpus with this API.

Update queries: POST to /api/phraser/v1/queries/

Add new queries to the training corpus or update existing queries.

Only unique queries will be actually added. Existing queries will be replaced. The result will contain the list of query hashes corresponding to the given list of queries. Duplicate queries will have the same hashes. intent_id shall be determined by human labelers and must be set in all queries.

Arguments

Field name Type Required Description
body list of query objects Y New queries to upload

intent_id field must be set in all queries

Example:

"body": [
    {
        "text": "Who should I contact about my booking?",
        "intent_id": 2
    },
    {
        "text": "Where is my confirmation?"
        "intent_id": 5
    },
    {
        "text": "WHERE IS MY CONFIRMATION???"
        "intent_id": 5
    },
]

Server reply

List of respective query hashes.

Example:

["67F227A57F1A496F", "47E25DF1D9BAF663", "47E25DF1D9BAF663"]

Delete one query: DELETE /api/phraser/v1/queries/{query_hash}

Server reply

Appropriate HTTP status

Get a list of queries: GET from /api/phraser/v1/queries/

Export query objects from the training corpus (with pagination). It could be used to retrieve all queries, or a portion of.

Arguments

Field name Type Required Description
offset int N Offset in the sorted list of queries. May be negative (Then is applied from the end of the list)
limit int N Number of queries to return. Default value is 100

Example: /api/phraser/v1/queries?offset=1138&limit=1

Server reply

List of query objects

Example:

[
    {
        "hash": "23db4",
        "text": "Who should I contact about my booking?",
        "intent_id": 2
    }
]

Get one query: GET from /api/phraser/v1/queries/{query_hash}

Export one specific query, identified by hash, from the training corpus.

Server reply

A query object

Example:

{
    "hash": "34512",
    "text": "Where is my confirmation?",
    "intent_id": 13
}

API for queries attached to intents

The queries for which the correct intent is known with certainty can be retrieved and managed through this additional API which is intent-based. These queries are “attached” to the respective intent.

Update intent’s queries: POST to /api/phraser/v1/intents/{intent_id}/queries/

Attach queries to a specific intent. If given queries are attached to other intents, they are detached and attached to the selected intent.

Arguments

Field name Type Required Description
body list of query objects N Queries to attach

hash and intent_id are ignored.

Example:

[
    {
    "text": "Where is my booking confirmation?"
    }
]

Server reply

List of respective query hashes

Example:

["706CB0285892CF2E"]

Delete one query attached to the intent: DELETE /api/phraser/v1/intents/{intent_id}/queries/{query_id}

Server reply

Appropriate HTTP status

Get list of queries attached to the intent: GET from /api/phraser/v1/intents/{intent_id}/queries/

Arguments

Field name Type Required Description
offset int N zero-based
limit int N default value 100. Max value 1000

Example: /api/phraser/v1/intents/1/queries?offset=1138&limit=2

Server reply

List of query objects

Example:

[
    {
    "hash": "123e4",
    "intent_id": 1,
    "text": "Where's the money Lebowski?"
    },
    {
    "id": "9ffc94",
    "intent_id": 1,
    "text": "Who should I contact about my booking?"
    }
]

Get one query attached to the intent: GET from /api/phraser/v1/intents/{intent_id}/queries/{query_hash}

Retrieve one specific query, identified by the query_hash

Server reply

Query object

Log API

This part is used for working with the raw log of all queries entered by end-users in the past – see Log entries. The set of queries in this log may not fully coincide with the Queries in the training corpus. Indeed, only legitimate user queries shall be added to the training corpus. Garbage and spam may be discarded.

The intended use of this API is as follows. The system logs all end-user queries. You retrieve them, one-by-one or in bulk, make humans assign the right intent id to each query, add them to the training set (via Queries API), and invoke re-training of the ML models. You may use dedicated (hired) labelers or rely on the end-users or the community feedback.

For you, the raw query log is read-only. The system logs all end-user queries, as is. You may terieve them, but not modify.

Get list of log entries: GET from /api/phraser/v1/log/

Export log entry objects from the the raw query log (with pagination). It could be used to retrieve the entire query log, or a portion of.

Arguments

Field name Type Required Description
offset int N Offset in the list of log entries (sorted from oldest to newest). May be negative (Then it is applied from the end of the list, thus taking N newest entries)
limit int N default value 10. Max value 1000

Example: /api/phraser/v1/log?timestamp=2017-03-21&offset=4238&limit=1

Server reply

List of log entry objects

Example:

[
    {
        "query": "How can I pay?",
        "search_id": "73b61636-29b1-4bee-8845-3bc0ffe9a86a",
        "query_hash": "01C5A0BBD64963CC",
        "timestamp": "2017-03-21T13:23:53+00:00",
        "result_ids": [ 3, 5, 8 ]
    }
]

Get one log entry: GET from /api/phraser/v1/log/{search_id}

Retrieve one specific log entry, identified by the search_id.

Server reply

Log entry object

Example:

{
    "query": "How can I pay?",
    "search_id": "73b61636-29b1-4bee-8845-3bc0ffe9a86a",
    "query_hash": "01C5A0BBD64963CC",
    "timestamp": "2017-03-21T13:23:53+00:00",
    "result_ids": [ 23, 5, 8 ]
}

Train Machine Learning API

This API invokes re-training of ML algorithms and re-indexing data, after completing editing the training corpus.

Train Machine Learning: POST to /api/phraser/v1/train

Re-train ML algorithms and re-index data.

Arguments

None

Server reply

Appropriate HTTP status