5. Built-in Block Classes
Built-in block classes are block classes that are included in DialBB in advance.
Below, the explanation of the blocks that deal with only Japanese is omitted.
5.1. Simple Canonicalizer (Simple String Canonicalizer Block)
(dialbb.builtin_blocks.preprocess.simple_canonicalizer.SimpleCanonicalizer)
Canonicalizes user input sentences. The main target language is English.
5.1.1. Input/Output
Input
input_text: Input string (string)Example: “I like ramen”.
Output
output_text: string after normalization (string)Example: “i like ramen”.
5.1.2. Process Details
Performs the following processing on the input string.
Deletes leading and tailing spaces.
Replaces upper-case alphabetic characters with lower-case characters.
Deletes line breaks.
Converts a sequence of spaces into a single space.
5.2. LR-CRF Understander (Language Understanding Block using Logistic Regression and Conditional Random Fields)
(dialbb.builtin_blocks.understanding_with_lr_crf.lr_crf_understander.Understander)
Determines the user utterance type (also called intent) and extracts the slots using logistic regression and conditional random fields.
Performs language understanding in Japanese if the language element of the configuration is ja, and language understanding in English if it is en.
At startup, this block reads the knowledge for language understanding written in Excel and trains the models for logistic regression and conditional random fields.
At runtime, it uses the trained models for language understanding.
5.2.1. Input/Output
input
tokens: list of tokens (list of strings)Example:
['I' 'like', 'chicken', 'salad' 'sandwiches'].
output
nlu_result: language understanding result (dict or list of dict)If the parameter
num_candidatesof the block configuration described below is 1, the language understanding result is a dictionary type in the following format.{ "type": <user utterance type (intent)>,. "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>} }The following is an example.
{ "type": "tell-like-specific-sandwich", "slots": {"favorite-sandwich": "roast beef sandwich"} }
If
num_candidatesis greater than 1, it is a list of multiple candidate comprehension results.[{"type": <user utterance type (intent)>, "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>}}, ... {"type": <user utterance type (intent)>,. "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>}}, ... ....]
5.2.2. Block Configuration Parameters
knowledge_file(string)Specifies the Excel file that describes the knowledge. The file path must be relative to the directory where the configuration file is located.
flags_to_use(list of strings)Specifies the flags to be used. If one of these values is written in the
flagcolumn of each sheet, it is read. If this parameter is not set, all rows are read.canonicalizerSpecifies the canonicalization information to be performed when converting language comprehension knowledge to Snips training data.
classSpecifies the class of the normalization block. Basically, the same normalization block used in the application is specified.
num_candidates(integer. Default value is1)Specifies the maximum number of language understanding results (n for n-best).
knowledge_google_sheet(hash)This specifies information for using Google Sheets instead of Excel.
sheet_id(string)Google Sheet ID.
key_file(string)Specify the key file to access the Google Sheet API as a relative path from the configuration file directory.
5.2.3. Language Understanding Knowledge
Language understanding knowledge consists of the following two sheets.
sheet name |
contents |
|---|---|
utterances |
examples of utterances by type |
slots |
relationship between slots and entities and a list of synonyms |
The sheet name can be changed in the block configuration, but since it is unlikely to be changed, a detailed explanation is omitted.
5.2.3.1. utterances sheet
Each row consists of the following columns
flagFlags to be used or not.
Y(yes),T(test), etc. are often written. Which flag’s rows to use is specified in the configuration. In the configuration of the sample application, all rows are used.typeUser utterance type (Intent)
utteranceExample utterance.
slotsSlots that are included in the utterance. They are written in the following form
<slot name>=<slot value>, <slot name>=<slot value>, ... <slot name>=<slot value>
The following is an example.
location=philladelphia, favorite-sandwich=cheesesteak sandwitch
The sheets that this block uses, including the utterance sheets, can have other columns than these.
5.2.3.2. slots sheet
Each row consists of the following columns.
flagSame as on the utterance sheet.
slot nameSlot name. It is used in the example utterances in the utterances sheet. Also used in the language understanding results.
entityThe name of the dictionary entry. It is also included in language understanding results.
synonymsSynonyms joined by
','.
5.3. ChatGPT Understander (Language Understanding Block using ChatGPT)
(dialbb.builtin_blocks.understanding_with_chatgpt.chatgpt_understander.Understander)
Determines the user utterance type (also called intent) and extracts the slots using OpenAI’s ChatGPT.
Performs language understanding in Japanese if the language element of the configuration is ja, and language understanding in English if it is en.
At startup, this block reads the knowledge for language understanding written in Excel, and converts it into the list of user utterance types, the list of slots, and the few shot examples to be embedded in the prompt.
At runtime, input utterance is added to the prompt to make ChatGPT perform language understanding.
5.3.1. Input/Output
input
input_text: input stringThe input string is assumed to be canonicalized.
Example:
"I like chicken salad sandwiches".
output
nlu_result: language understanding result (dict)```json { "type": <user utterance type (intent)>,. "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>} } ``` The following is an example. ```json { "type": "tell-like-specific-sandwich", "slots": {"favorite-sandwich": "roast beef sandwich"} } ```
5.3.2. Block Configuration Parameters
knowledge_file(string)Specifies the Excel file that describes the knowledge. The file path must be relative to the directory where the configuration file is located.
flags_to_use(list of strings)Specifies the flags to be used. If one of these values is written in the
flagcolumn of each sheet, it is read. If this parameter is not set, all rows are read.canonicalizerSpecifies the canonicalization information to be performed when converting language comprehension knowledge to Snips training data.
classSpecifies the class of the normalization block. Basically, the same normalization block used in the application is specified.
knowledge_google_sheet(hash)This specfies information for using Google Sheet instead of Excel.
sheet_id(string)Google Sheet ID.
key_file(string)Specify the key file to access the Google Sheet API as a relative path from the configuration file directory.
gpt_model(string. The default value isgpt-4o-mini.)Specifies the ChatGPT model.
gpt-4ocan be specified.gpt-4cannot be used.prompt_templateThis specifies the prompt template file as a relative path from the configuration file directory.
When this is not specified,
dialbb.builtin_blocks.understanding_with_chatgpt.prompt_templates_ja .PROMPT_TEMPLATE_JA(for Japanese) ordialbb.builtin_blocks.understanding_with_chatgpt.prompt_templates_en .PROMPT_TEMPLATE_EN(for English) is used.A prompt template is a template of prompts for making ChatGPT language understanding, and it can contain the following variables starting with
@.@typesThe list of utterance types.@slot_definitionsThe list of slot definitions.@examplesSo-called few shot examples each of which has an utterances example, its utterance type, and its slots.@inputinput utterance.
Values are assigned to these variables at runtime.
5.3.3. Language Understanding Knowledge
The description format of the language understanding knowledge in this block is exactly the same as that of the LR-CRF Understander. For more details, please refer to “Language Understanding Knowledge” in the explanation of LR-CRF Understander.
5.4. STN Manager (State Transition Network-based Dialogue Management Block)
(dialbb.builtin_blocks.stn_manager.stn_management)
It perfomrs dialogue management using a state-transition neetwork.
input
sentence: user utterance after canonicalization (string)nlu_result: language understanding result (dictionary or list of dictionaries)user_id: user ID (string)aux_data: auxiliary data (dictionary) (not required, but specifying this is recommended)
output
output_text: system utterance (string)Example:
"So you like chiken salad sandwiches."final: a flag indicating whether the dialog is finished or not. (bool)aux_data: auxiliary data (dictionary type)The auxiliary data of the input is updated in action functions described below, including the ID of the transitioned state. Updates are not necessarily performed in action functions. The transitioned state is added in the following format.
{"state": "I like a particular ramen" }
5.4.1. Block configuration parameters
knowledge_file(string)Specifies an Excel file describing the scenario. It is a relative path from the directory wherer the configuration file exists.
function_definitions(string)The name of the module that defines the scenario function (see dictionary_function). If there are multiple modules, connect them with
':'. The module must be in the Python module search path. (The directory containing the configuration file is in the module search path.)flags_to_use(list of strings)Same as the Snips Understander.
knowledge_google_sheet(object)Same as the Snips Understander.
scenario_graph: (boolean. Default value isFalse)If this value is
true, the values in thesystem utteranceanduser utterance examplecolumns of the scenario sheet are used to create the graph. This allows the scenario writer to intuitively see the state transition network.repeat_when_no_available_transitions(Boolean. Default value isfalse)When this value is
true, if there is no transition that matches the condition, the same utterance is repeated without transition.multi_party(Boolean. Deafault value isfalse)When this value is set to
true, the value ofuser_idis included in the conversation history for Section 5.4.4.1 and in the prompts for built-in functions using large language models described inllm_functions.
5.4.2. Dialogue Management Knowledge Description
The dialog management knowledge (scenario) is written in the scenario sheet in the Excel file.
Each row of the sheet represents a transition. Each row consists of the following columns
flagSame as on the utterances sheet.
stateThe name of the source state of the transition.
system utteranceCandidates of the system utterance generated in the
statestate.The
{<variable>}or{<function call>}in the system utterance string is replaced by the value assigned to the variable during the dialogue or the return value of the function call. This will be explained in detail in “Variables and Function Calls in System Utterances”.There can be multiple lines with the same
state, but allsystem utterancein the lines having the samestatebecome system utterance candidates, and will be chosen randomely.user utterance exampleExample of user utterance. It is only written to understand the flow of the dialogue, and is not used by the system.
user utterance typeThe user utterance type obtained by language understanding. It is used as a condition of the transition.
conditionsCondition (sequence of conditions). A function call that represents a condition for a transition. There can be more than one. If there are multiple conditions, they are concatenated with
';'. Each condition has the form<function name>(<argument 1>, <argument 2>, ..., <argument n>). The number of arguments can be zero. See Function arguments for the arguments that can be used in each condition.actionsA sequece of actions, which are function calls to execute when the transition occurs. If there is more than one, they are concatenated with
;. Each condition has the form<function name>(<argument 1>, <argument 2>, ..., <argument n>). The number of arguments can be zero. See Function arguments for the arguments that can be used in each condition.next stateThe name of the destination state of the transition.
There can be other columns on this sheet (for use as notes).
If the user utterance type of the transition represented by each line is empty or matches the result of language understanding, and if the conditions are empty or all of them are satisfied, the condition for the transition is satisfied and the transition is made to the next state state. In this case, the action described in actions is executed.
Rows with the same state column (transitions with the same source state) are checked to see if they satisfy the transition conditions, starting with the one written above.
The default transition (a line with both user utterance type and conditions columns empty) must be at the bottom of the rows having the state column values.
Unless repeat_when_no_available_transitions is True, the default transition is necessary.
5.4.3. Special status
The following state names are predefined.
#prepPreparation state. If this state exists, a transition from this state is attempted when the dialogue begins (when the client first accesses). The system checks if all conditions in the conditions column of the row with the
#prepvalue in thestatecolumn are met. If they are, the actions in that row’s actions are executed, then the system transitions to the state in next state, and the system utterance for that state is outputted.This is used to change the initial system utterance and state according to the situation. The Japanese sample application changes the content of the greeting depending on the time of the day when the dialogue takes place.
This state is not necessary.
#initialInitial state. If there is no
#prep state, the dialogue starts from this state when it begins (when the client first accesses). The system utterance for this state is placed inoutput_textand returned to the main process.There must be either
#prepor#initialstate.#errorMoves to this state when an internal error occurs. Generates a system utterance and exits.
A state ID beginning with
#final, such as#final_say_bye, indicates a final state. In a final state, the system generates a system utterance and terminates the dialog.
5.4.4. Conditions and Actions
5.4.4.1. Context information
STN Manager maintains context information for each dialogue session. The context information is a set of variables and their values (python dictionary type data), and the values can be any data structure.
Condition and action functions access context information.
The context information is pre-set with the following key-value pairs.
key |
value |
|---|---|
_current_state_name |
name of the state before transition (string) |
_config |
dictionary type data created by reading configuration file |
_block_config |
The part of the dialog management block in the configuration file (dictionary) |
_aux_data |
aux_data (dictionary) received from main process |
_previous_system_utterance |
previous system utterance (string) |
_dialogue_history |
Dialogue history (list) |
_turns_in_state |
The number of user turns in the current state (integer) |
_session_id |
The session ID of the current conversation (string) |
_user_id |
The user ID of the most recent user utterance (string) |
The dialog history is in the following form.
[
{
"speaker": "user",
"utterance": <canonicalized user utterance (string)>
},
{
"speaker": "system",
"utterance": <canonicalized user utterance (string)
},
{
"speaker": "user",
"utterance": <canonicalized user utterance (string)
},
...
]
In addition to these, new key/value pairs can be added within the action function.
5.4.4.2. Function arguments
The arguments of the functions used in conditions and actions are of the following types.
Special variables (strings beginning with
#)The following types are available
#<slot name>Slot value of the language understanding result of the previous user utterance (the input
nlu_resultvalue). If the slot value is empty, it is an empty string.#<key for auxiliary data>The value of this key in the input
aux_data. For example, in the case of#emotion, the value ofaux_data['emotion']. If this key is missing, it is an empty string.#sentenceImmediate previous user utterance (canonicalized)
#user_idUser ID string
Variables (strings beginning with
*)The value of a variable in context information. It is in the form
*<variable name>. The value of a variable must be a string. If the variable is not in the context information, it is an empty string.Variable reference (string beginning with
&)Refers to a context variable in function definitions. It is in the form
&<context variable name>Constant (string enclosed in
"")It means the string as it is.
5.4.5. Variables and Function Calls in System Utterances
In system utterances, parts enclosed in { and } are variables or function calls that are replaced by the value of the variable or the return value of the function call.
Variables that start with # are special variables mentioned above. Other variables are normal variables, which are supposed to be present in the context information. If these variables do not exist, the variable names are used as is without replacement.
For function calls, the functions can take arguments explained above as functions used for conditions or actions. The return value must be a string.
5.4.6. Function Definitions
Functions used in conditions and actions (called “scenario functions” altogether)are either built-in to DialBB or defined by the developers. The function used in a condition returns a Boolean value, while the function used in an action returns nothing.
5.4.6.1. Built-in functions
The built-in functions are as follows:
Functions used in conditions
_eq(x, y)Returns
Trueifxandyare the same.e.g.,
_eq(*a, "b")returnsTrueif the value of variableais"b"._eq(#food, "sandwich"): returnsTrueif#foodslot value is"sandwich"._ne(x, y)Returns
Trueifxandyare not the same.e.g.,
_ne(#food, "ramen")returnsFalseif#foodslot is"ramen"._contains(x, y)Returns
Trueifxcontainsyas a string.e.g.,
contains(#sentence, "yes"): returnsTrueif the user utterance contains “yes”._not_contains(x, y)Returns
Trueifxdoes not containyas a string.e.g.,
_not_contains(#sentence, "yes")returnsTrueif the user utterance contains"yes"._member_of(x, y)Returns
Trueif the list formed by splittingyby':'contains the stringx.e.g.,
_member_of(#food, "ramen:fried rice:dumplings")_not_member_of(x, y)e.g.,
_not_member_of(*favorite_food, "ramen:fried_han:dumpling")_num_turns_exceeds(n)Returns
Truewhen the number of user turns exceeds the integer represented by the stringn.e.g.:
_num_turns_exceeds("10")_num_turns_in_state_exceeds(n)Returns
Truewhen the number of user turns in the current state exceeds the integer represented by the stringn.e.g.:
_num_turns_in_state_exceeds("5")_check_with_llm(task)and_check_with_prompt_template(prompt_template)Makes the judgment using a large language model. More details follow.
Functions used in actions
_set(x, y)Sets
yto the variablex.e.g.,
_set(&a, b): sets the value ofbtoa._set(&a, "hello"): sets"hello"toa._set(x, y)Sets
yto the variablex.e.g.,
_set(&a, b): sets the value ofbtoa._set(&a, "hello"): sets"hello"toa.
Functions used in system utterances
_generate_with_llm(task)and_generate_with_prompt_template(prompt_template)Generates a string using a large language model (currently only OpenAI’s ChatGPT). More details follow.
5.4.6.2. Built-in functions using large language models
The functions _check_with_llm(task) and _generate_with_llm(task) use a large language model (currently only OpenAI’s ChatGPT) along with dialogue history to perform condition checks and text generation. Here are some examples:
Example of a condition check:
_check_with_llm("Please determine if the user said the reason.")
Example of text generation:
_generate_with_llm("Generate a sentence to say it's time to end the talk by continuing the conversation in 50 words.")
To use these functions, the following settings are required:
Set OpenAI’s API key to environment variable
OPENAI_API_KEY.Please check websites and other resources to find out how to obtain an API key from OpenAI.
Add the following elements to the chatgpt block configuration:
gpt_model(string)This specifies the model name of GPT, such as
gpt-4o,gpt-4o-mini, etc. The default value isgpt-4o-mini.gpt-5cannot be used.instruction(string)This is used as the system role message when calling the ChatGPT API. It is only used during text generation. See thisdefault for the default value.
temperature(float)This specifies the temperature parameter for GPT. The default value is
0.7.temperature_for_checking(float)This is the temperature parameter of the GPT used during conditional evaluation. If this is not specified, the value of
temperaturewill be used instead.situation(list of strings)A list that enumerates the scenarios to be written in the GPT prompt. If this element is absent, no specific situation is specified.
persona(lis of strings)A list that enumerates the system persona to be written in the GPT prompt.
If this element is absent, no specific persona is specified.
e.g.:
chatgpt: gpt_model: gpt-4-turbo temperature: 0.7 situation: - You are a dialogue system and chatting with the user. - You met the user for the first time. - You and the user are similar in age. - You and the user talk in a friendly manner. persona: - Your name is Yui - 28 years old - Female - You like sweets - You don't drink alcohol - A web designer working for an IT company - Single - You talk very friendly - Diplomatic and cheerful
_check_with_prompt_template(prompt_template) and _generate_with_llm(prompt_template) perform condition checking and text generation by providing prompts to a large language model.
The prompts are created by replacing the placeholders in the specified prompt template with actual values.
To use these functions, you must set the environment variable OPENAI_API_KEY and configure the chatgpt element in the block configuration.
Here are some examples:
Example of condition checking:
_check_with_llm("Please determine whether the user has given a reason.")
Another example of condition checking:
_generate_with_prompt_template(""" # Situation {situation} # Your persona {persona} # Dialogue history up to now {dialogue_history} # Task Determine whether the user has given a reason, and answer with either 'yes' or 'no'. """)
Example of string generation:
_generate_with_prompt_template(""" # Situation {situation} # Your persona {persona} # Dialogue history up to now {dialogue_history} # Task Based on the dialogue so far, generate a closing utterance within 50 characters. """)
Parts enclosed in
{and}are placeholders.Available placeholders:
{dialogue_history}Replaced with the dialogue up to that point, including the latest user utterance.{situation}Replaced with the value ofsituationfrom thechatgptelement in the block configuration.{persona}Replaced with the value ofpersonafrom thechatgptelement in the block configuration.{current_time}Replaced with a string representing the current date, day of the week, and time (hour, minute, second) at which the dialogue is taking place.{<a string consisting only of alphabets, digits, and underscores>}If the string exists as a key in aux_data, it is replaced with the corresponding value converted to a string.
Placeholder removal
If an unreplaced placeholder remains and is enclosed in
[[[and]]], that portion will be removed.
5.4.6.3. Syntax sugars for built-in functions
Syntax sugars are provided to simplify the description of built-in functions.
<variable name>==<value>This means
_eq(<variable name>, <value>).e.g.:
#favorite_sandwich=="chiken salad sandwich"<variable name>!=<value>This means
_ne(<variable name>, <value>).e.g.:
#NE_Person!=""<variable name>=<value>This means
_set(&<variable name>, <value>).e.g.:
user_name=#NE_Person
TT > <integer>This means
_num_turns_exceeds("<integer>").e.g.:
TT>10
TS > <integer>This means
_num_turns_in_state_exceeds("<integer>").e.g.:
TS>5
$<task string>$When used as a condition, it means
_check_with_llm("<task string>"), and when used in a system utterance, it means{_generate_with_llm("<task string>")}.Example of a condition:
$Please determine if the user said the reason$
Example of a text generation function call in a system utterance:
I understand. $Generate a sentence to say it's time to end the talk by continuing the conversation in 50 words$ Thank you for your time.
I understand. {$”Generate a sentence to say it’s time to end the talk by continuing the conversation in 50 words” } Thank you for your time.
This used to be
$"<task string>"but it is deprecated.$$$<prompt template>$$$When used as a condition, it means
_check_with_prompt_template("<prompt template>"), and when used in system utterances, it means{_generate_with_prompt_template("<prompt template>")}.
5.4.6.4. Function definitions by the developers
When the developer defines functions, he/she edits a file specified in function_definition element in the block configuration.
def get_ramen_location(ramen: str, variable: str, context: Dict[str, Any]) -> None:
location:str = ramen_map.get(ramen, "Japan")
context[variable] = location
In addition to the arguments used in the scenario, variable of dictionary type must be added to receive context information.
All arguments used in the scenario must be strings.
In the case of a special variable or variables, the value of the variable is passed as an argument.
In the case of a variable reference, the variable name without the &’ is passed, and in the case of a constant, the string in "" is passed.
5.4.6.5. Logging in functions
In scenario functions, logging can be performed using the following functions. The logs are written to standard output along with the session ID.
dialbb.builtin_blocks.stn_management.util.scenario_function_log_debug(message: str)Writes a log at the debug level.dialbb.builtin_blocks.stn_management.util.scenario_function_log_info(message: str)Writes a log at the info level.dialbb.builtin_blocks.stn_management.util.scenario_function_log_warning(message: str)Writes a log at the warning level.dialbb.builtin_blocks.stn_management.util.scenario_function_log_error(message: str)Writes a log at the error level. In debug mode, this function also raises an Exception.
5.4.7. Reaction
In an action function, setting a string to _reaction in the context information will prepend that string to the system’s response after the state transition.
For example, if the action function _set(&_reaction, "I agree.") is executed and the system’s response in the subsequent state is “How was the food?”, then the system will return the response “I agree. How was the food?”.
5.4.8. Continuous Transition
If a transition is made to a state where the first system utterance is $skip, the next transition is made immediately without returning a system response. This is used in cases where the second transition is selected based on the result of the action of the first transition.
5.4.9. Dealing with Multiple Language Understanding Results
If the input nlu_result is a list that contains multiple language understanding results, the process is as follows.
Starting from the top of the list, check whether the type value of a candidate language understanding result is equal to the user utterance type value of one of the possible transitions from the current state, and use the candidate language understanding result if there is an equal transition. If none of the candidate language comprehension results meet the above conditions, the first language comprehension result in the list is used.
5.4.10. Subdialogue
If the destination state name is of the form #gosub:<state name1>:<state name2>, it transitions to the state <state name1> and executes a subdialogue starting there. If the destination state is :exit, it moves to the state <state name2>.
For example, if the destination state name is of the form #gosub:request_confirmation:confirmed, a subdialogue starting with request_confirmatin is executed, and when the destination state becomes :exit, it returns to confirmed. When the destination becomes :exit, it returns to confirmed.
It is also possible to transition to a subdialogue within a subdialogue.
5.4.11. Saving Context Information in an External Database
When operating the DialBB application as a web server, using a load balancer to distribute processing across multiple instances can handle request surges efficiently. By saving context information in an external database (MongoDB), a single session can be processed by different instances. (Feature added in version 0.10.0)
To use an external database, specify context_db element like the following in the block configuration:
context_db:
host: localhost
port: 27017
user: admin
password: password
Each key is defined as follows:
host(str)The hostname where MongoDB is running.
port(int, default value:27017)The port number used to access MongoDB.
user(str)The username for accessing MongoDB.
password(str)The password for accessing MongoDB.
5.4.12. Advanced Mechanisms for Handling Speech Input
5.4.12.1. Additional block configuration parameters
input_confidence_threshold(float; default value1.0) If the input is a speech recognition result and its confidence is less than this value, the confidence is considered low. The confidence of the input is the value ofconfidenceinaux_data. If there is noconfidencekey inaux_data, the confidence is considered high. In the case of low confidence, the process depends on the value of the parameter described below.confirmation_request(object)This is specified in the following form.
confirmation_request: function_to_generate_utterance: <function name (string)> acknowledgement_utterance_type: <user utterance type name of acknowledgement (string)> denial_utterance_type: <name of user utterance type for affirmation (string)>
If this is specified, the function specified in
function_to_generate_utteranceis executed and the return value is spoken (called a confirmation request utterance), instead of making a state transition when the input is less certain. Then, the next process is performed in response to the user’s utterance.When the confidence level of the user’s utterance is low, the transition is not made and the previous state of utterance is repeated.
If the type of user utterance is specified by
acknowledgement_utterance_type, the transition is made according to the user utterance before the acknowledgement request utterance.If the type of user utterance is specified by
denial_utterance_type, no transition is made and the utterance in the original state is repeated.If the user utterance type is other than that, a normal transition is performed.
However, if the input is a barge-in utterance (
aux_datahas abarge_inelement and its value isTrue), this process is not performed.The function specified by
function_to_generate_utteranceis defined in the module specified byfunction_definitionsin the block configuration. The arguments of the function are thenlu_resultand context information of the block’s input. The return value is a string of the system utterance.utterance_to_ask_repetition(string)If it is specified, then when the input confidence is low, no state transition is made and the value of this element is taken as the system utterance. However, in the case of barge-in (
aux_datahas abarge_inelement and its value isTrue), this process is not performed.confirmation_requestandutterance_to_ask_repetitioncannot be specified at the same time.ignore_out_of_context_barge_in(Boolean; default value isFalse).If this value is
True, the input is a barge-in utterance (the value ofbarge_inin theaux_dataof the request isTrue), the conditions for a transition other than the default transition are not met (i.e. the input is not expected in the scenario), or the confidence level of the input is low the transition is not made. In this case, thebarge_in_ignoredof the responseaux_datais set toTrue.reaction_to_silence(object)It has an
actionelement. The value of theactionelement is a string that can be eitherrepeatortransition. If the value of theactionelement is"transition", the"destination"element is required. The value of thedestinationkey is a string.If the input
aux_datahas along_silencekey and its value isTrue, and if the conditions for a transition other than the default transition are not met, then it behaves as follows, depending on this parameter:If this parameter is not specified, normal state transitions are performed.
If the value of
actionis"repeat", the previous system utterance is repeated without state transition.If the value of
actionis"transition", then the transition is made to the state specified bydestination.
5.4.12.2. Adding built-in condition functions
The following built-in condition functions have been added
_confidence_is_low()Returns
Trueif the value ofconfidencein the inputaux_datais less than or equal to the value ofinput_confidence_thresholdin the configuration._is_long_silence()Returns
Trueif the value oflong_silencein the input’saux_dataisTrue.
5.4.12.3. Ignoring the last incorrect input
If the value of rewind in the input aux_data is True, a transition is made from the state before the last response.
Any changes to the dialog context due to actions taken during the previous response will also be undone.
This function is used when a user utterance is accidentally split in the middle during speech recognition and only the first half of the utterance is responded to.
Note that the context information is reverted, but not if you have changed the value of a global variable in an action function or the contents of an external database.
5.5. ChatGPT Dialogue (ChatGPT-based Dialogue Block)
(Changed in ver0.7)
(dialbb.builtin_blocks.chatgpt.chatgpt.ChatGPT)
Engages in dialogue using OpenAI’s ChatGPT.
5.5.1. Input/Output
Input
user_utterance: Input string (string)aux_data: Auxiliary data (dictionary).user_id: auxiliary data (dictionary)
Output
system_utterance: Input string (string)aux_data: auxiliary data (dictionary type)final: boolean flag indicating whether the dialog is finished or not.
The input user_id is not used. The output aux_data is the same as the input aux_data and final is always False.
When using these blocks, you need to set the OpenAI license key in the environment variable OPENAI_API_KEY.
5.5.2. Block Configuration Parameters
first_system_utterance(string, default value is"")This is the first system utterance of the dialog.
user_name(string, default value is"User".)This string is used when providing conversation history to the ChatGPT prompt. Deprecated in version 1.1.0, but reinstated in version 1.1.1.
system_name(string, default value is “System”)This string is used when providing conversation history to the ChatGPT prompt. Deprecated in version 1.1.0, but reinstated in version 1.1.1.
prompt_template(string)This specifies the file of the prompt for making ChatGPT generate a system utterance as a relative path from the configuration file directory.
temperature(float, default value is0.7)THe temperature parameter when calling ChatGPT.
gpt_model(string, default value isgpt-4o-mini)Open AI GPT model. You can specify
gpt-4o,gpt-4o-miniand so on.instruction(string, see thisdefault for the default value.)The instruction to ChatGPT as system role message.
5.5.3. Place Holders in Prompt Templates
The following place holders can be used in prompt templates.
{current_time}Replaced with a string representing the current date, day of the week, and time (hour, minute, second) at which the dialogue is taking place.{<a string consisting only of alphabets, digits, and underscores>}If the string exists as a key in aux_data, it is replaced with the corresponding value converted to a string.
Placeholder removal
If an unreplaced placeholder remains and is enclosed in
[[[and]]], that portion will be removed.
5.5.4. Process Details
At the beginning of the dialog, the value of
first_system_utterancein the block configuration is returned as system utterance.In the second and subsequent turns, the prompt template is given to ChatGPT and the returned string is returned as the system utterance.
5.6. ChatGPT NER (Named Entity Recognition Block Using ChatGPT)
(dialbb.builtin_blocks.ner_with_chatgpt.chatgpt_ner.NER)
This block utilizes OpenAI’s ChatGPT to perform named entity recognition (NER).
If the language element in the configuration is set to ja, it extracts named entities in Japanese. If set to en, it extracts named entities in English.
At startup, this block reads named entity knowledge from an Excel file, converts it into a list of named entity classes, descriptions for each class, examples of named entities in each class, and extraction examples (few-shot examples), and embeds them into the prompt.
During execution, the input utterance is added to the prompt, and ChatGPT is used for named entity extraction.
5.6.1. Input and Output
Input
input_text: Input stringaux_data: auxiliary data (dictionary)
Output
aux_data: Auxiliary data (dictionary format)The named entity extraction results are added to the provided
aux_data.The extracted named entities follow this format:
{"NE_<Label>": "<Named Entity>", "NE_<Label>": "<Named Entity>", ...}<Label>represents the named entity class. The named entity is the recognized phrase found ininput_text. If multiple entities of the same class are found, they are concatenated with:.Example:
{"NE_Person": "John:Mary", "NE_Dish": "Chicken Marsala"}
5.6.2. Block Configuration Parameters
knowledge_file(String)Specifies the Excel file containing named entity knowledge. The file path should be relative to the directory where the configuration file is located.
flags_to_use(List of strings)If any of these values are present in the
flagcolumn of each sheet, the corresponding row will be loaded. If this parameter is not set, all rows will be loaded.knowledge_google_sheet(Hash)Information for using Google Sheets instead of Excel.
sheet_id(String)The ID of the Google Sheet.
key_file(String)Specifies the key file for accessing the Google Sheet API. The file path should be relative to the configuration file directory.
gpt_model(String, default:gpt-4o-mini)Specifies the ChatGPT model. Options include
gpt-4o, etc.prompt_templateSpecifies the file containing the prompt template, relative to the configuration file directory.
If not specified, the default templates
dialbb.builtin_blocks.ner_with_chatgpt.chatgpt_ner.prompt_template_ja.PROMPT_TEMPLATE_JA(for Japanese) ordialbb.builtin_blocks.ner_with_chatgpt.chatgpt_ner.prompt_template_en.PROMPT_TEMPLATE_EN(for English) will be used.The prompt template defines how ChatGPT is instructed for language understanding and includes the following variables (prefixed with
@):@classesList of named entity classes.@class_explanationsDescriptions of each named entity class.@ne_examplesExamples of named entities for each class.@ner_examplesExamples of utterances and their correct named entity extraction results (few-shot examples).@inputThe input utterance.
Values are assigned to these variables at runtime.
5.6.3. Named Entity Knowledge
Named entity knowledge consists of the following two sheets:
Sheet Name |
Description |
|---|---|
utterances |
Examples of utterances and named entity extraction results. |
classes |
Relationship between slots and entities, along with a list of synonyms. |
Although the sheet names can be changed in the block configuration, this is rarely needed, so detailed explanations are omitted.
5.6.3.1. utterances Sheet
Each row consists of the following columns:
flagA flag to determine whether to use the row. Common values include
Y(yes) andT(test). The configuration specifies which flags to use.utteranceExample utterance.
entitiesNamed entities contained in the utterance. They are formatted as follows:
<Named Entity Class>=<Named Entity>, <Named Entity Class>=<Named Entity>, ... <Named Entity Class>=<Named Entity>
Example:
Person=John, Location=Chicago
Additional columns besides these are allowed in the sheets used by this block.
5.6.3.2. classes Sheet
Each row consists of the following columns:
flagSame as in the
utterancessheet.classNamed entity class name.
explanationDescription of the named entity class.
examplesExamples of named entities, concatenated with
','.
5.7. spaCy-Based NER (Named Entity Recognizer Block using spaCy)
(dialbb.builtin_blocks.ner_with_spacy.ne_recognizer.SpaCyNER)
Performs named entity recognition using spaCy and GiNZA.
5.7.1. Input/Output
Input
input_text: Input string (string)aux_data: auxiliary data (dictionary)
Output
aux_data: auxiliary data (dictionary)The inputted
aux_dataplus the named entity recognition results.
The result of named entity recognition is as follows.
{
"NE_<label>": "<named entity>",
"NE_<label>": "<named entity>",
...
}
<label> is the class of named entities. <named entity> is a found named entity, a substring of ``input_text. If multiple named entities of the same class are found, they are concatenated with ‘:’`.
Example:
{
"NE_Person": "John:Mary",
"NE_Dish": "Chiken Marsala"
}
See the spaCy/GiNZA model website for more information on the class of named entities.
ja-ginza-electra(5.1.2): https://pypi.org/project/ja-ginza-electra/en_core_web_trf(3.5.0): https://spacy.io/models/en#en_core_web_trf-labels
5.7.2. Block Configuration Parameters
model(String: Required)The name of the spaCy/GiNZA model. It can be
ja_ginza_electra(Japanese),en_core_web_trf(English), etc.patterns(object; Optional)Describes a rule-based named entity extraction pattern. The pattern is a YAML format of the one described in spaCy Pattern Description.
The following is an example.
patterns: - label: Date pattern: yesterday - label: Date pattern: The day before yesterday
5.7.3. Process Details
Extracts the named entities in input_text using spaCy/GiNZA and returns the result in aux_data.