5. Built-in Block Classes
Built-in block classes are block classes that are included in DialBB in advance.
Below, the explanation of the blocks that deal with only Japanese is omitted.
5.1. Simple Canonicalizer (Simple String Canonicalizer Block)
(dialbb.builtin_blocks.preprocess.simple_canonicalizer.SimpleCanonicalizer
)
Canonicalizes user input sentences. The main target language is English.
5.1.1. Input/Output
Input
input_text
: Input string (string)Example: “I like ramen”.
Output
output_text
: string after normalization (string)Example: “i like ramen”.
5.1.2. Process Details
Performs the following processing on the input string.
Deletes leading and tailing spaces.
Replaces upper-case alphabetic characters with lower-case characters.
Deletes line breaks.
Converts a sequence of spaces into a single space.
5.2. LR-CRF Understander (Language Understanding Block using Logistic Regression and Conditional Random Fields)
(dialbb.builtin_blocks.understanding_with_lr_crf.lr_crf_understander.Understander
)
Determines the user utterance type (also called intent) and extracts the slots using logistic regression and conditional random fields.
Performs language understanding in Japanese if the language
element of the configuration is ja
, and language understanding in English if it is en
.
At startup, this block reads the knowledge for language understanding written in Excel and trains the models for logistic regression and conditional random fields.
At runtime, it uses the trained models for language understanding.
5.2.1. Input/Output
input
tokens
: list of tokens (list of strings)Example:
['I' 'like', 'chicken', 'salad' 'sandwiches']
.
output
nlu_result
: language understanding result (dict or list of dict)If the parameter
num_candidates
of the block configuration described below is 1, the language understanding result is a dictionary type in the following format.{ "type": <user utterance type (intent)>,. "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>} }
The following is an example.
{ "type": "tell-like-specific-sandwich", "slots": {"favorite-sandwich": "roast beef sandwich"} }
If
num_candidates
is greater than 1, it is a list of multiple candidate comprehension results.[{"type": <user utterance type (intent)>, "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>}}, ... {"type": <user utterance type (intent)>,. "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>}}, ... ....]
5.2.2. Block Configuration Parameters
knowledge_file
(string)Specifies the Excel file that describes the knowledge. The file path must be relative to the directory where the configuration file is located.
flags_to_use
(list of strings)Specifies the flags to be used. If one of these values is written in the
flag
column of each sheet, it is read. If this parameter is not set, all rows are read.canonicalizer
Specifies the canonicalization information to be performed when converting language comprehension knowledge to Snips training data.
class
Specifies the class of the normalization block. Basically, the same normalization block used in the application is specified.
num_candidates
(integer. Default value is1
)Specifies the maximum number of language understanding results (n for n-best).
knowledge_google_sheet
(hash)This specifies information for using Google Sheets instead of Excel.
sheet_id
(string)Google Sheet ID.
key_file
(string)Specify the key file to access the Google Sheet API as a relative path from the configuration file directory.
5.2.3. Language Understanding Knowledge
Language understanding knowledge consists of the following two sheets.
sheet name |
contents |
---|---|
utterances |
examples of utterances by type |
slots |
relationship between slots and entities and a list of synonyms |
The sheet name can be changed in the block configuration, but since it is unlikely to be changed, a detailed explanation is omitted.
5.2.3.1. utterances sheet
Each row consists of the following columns
flag
Flags to be used or not.
Y
(yes),T
(test), etc. are often written. Which flag’s rows to use is specified in the configuration. In the configuration of the sample application, all rows are used.type
User utterance type (Intent)
utterance
Example utterance.
slots
Slots that are included in the utterance. They are written in the following form
<slot name>=<slot value>, <slot name>=<slot value>, ... <slot name>=<slot value>
The following is an example.
location=philladelphia, favorite-sandwich=cheesesteak sandwitch
The sheets that this block uses, including the utterance sheets, can have other columns than these.
5.2.3.2. slots sheet
Each row consists of the following columns.
flag
Same as on the utterance sheet.
slot name
Slot name. It is used in the example utterances in the utterances sheet. Also used in the language understanding results.
entity
The name of the dictionary entry. It is also included in language understanding results.
synonyms
Synonyms joined by
','
.
5.3. ChatGPT Understander (Language Understanding Block using ChatGPT)
(dialbb.builtin_blocks.understanding_with_chatgpt.chatgpt_understander.Understander
)
Determines the user utterance type (also called intent) and extracts the slots using OpenAI’s ChatGPT.
Performs language understanding in Japanese if the language
element of the configuration is ja
, and language understanding in English if it is en
.
At startup, this block reads the knowledge for language understanding written in Excel, and converts it into the list of user utterance types, the list of slots, and the few shot examples to be embedded in the prompt.
At runtime, input utterance is added to the prompt to make ChatGPT perform language understanding.
5.3.1. Input/Output
input
input_text
: input stringThe input string is assumed to be canonicalized.
Example:
"I like chicken salad sandwiches"
.
output
nlu_result
: language understanding result (dict)```json { "type": <user utterance type (intent)>,. "slots": {<slot name>: <slot value>, ... , <slot name>: <slot value>} } ``` The following is an example. ```json { "type": "tell-like-specific-sandwich", "slots": {"favorite-sandwich": "roast beef sandwich"} } ```
5.3.2. Block Configuration Parameters
knowledge_file
(string)Specifies the Excel file that describes the knowledge. The file path must be relative to the directory where the configuration file is located.
flags_to_use
(list of strings)Specifies the flags to be used. If one of these values is written in the
flag
column of each sheet, it is read. If this parameter is not set, all rows are read.canonicalizer
Specifies the canonicalization information to be performed when converting language comprehension knowledge to Snips training data.
class
Specifies the class of the normalization block. Basically, the same normalization block used in the application is specified.
knowledge_google_sheet
(hash)This specfies information for using Google Sheet instead of Excel.
sheet_id
(string)Google Sheet ID.
key_file
(string)Specify the key file to access the Google Sheet API as a relative path from the configuration file directory.
gpt_model
(string. The default value isgpt-4o-mini
.)Specifies the ChatGPT model.
gpt-4o
can be specified.gpt-4
cannot be used.prompt_template
This specifies the prompt template file as a relative path from the configuration file directory.
When this is not specified,
dialbb.builtin_blocks.understanding_with_chatgpt.prompt_templates_ja .PROMPT_TEMPLATE_JA
(for Japanese) ordialbb.builtin_blocks.understanding_with_chatgpt.prompt_templates_en .PROMPT_TEMPLATE_EN
(for English) is used.A prompt template is a template of prompts for making ChatGPT language understanding, and it can contain the following variables starting with
@
.@types
The list of utterance types.@slot_definitions
The list of slot definitions.@examples
So-called few shot examples each of which has an utterances example, its utterance type, and its slots.@input
input utterance.
Values are assigned to these variables at runtime.
5.3.3. Language Understanding Knowledge
The description format of the language understanding knowledge in this block is exactly the same as that of the LR-CRF Understander. For more details, please refer to “Language Understanding Knowledge” in the explanation of LR-CRF Understander.
5.4. STN Manager (State Transition Network-based Dialogue Management Block)
(dialbb.builtin_blocks.stn_manager.stn_management
)
It perfomrs dialogue management using a state-transition neetwork.
input
sentence
: user utterance after canonicalization (string)nlu_result
: language understanding result (dictionary or list of dictionaries)user_id
: user ID (string)aux_data
: auxiliary data (dictionary) (not required, but specifying this is recommended)
output
output_text
: system utterance (string)Example:
"So you like chiken salad sandwiches."
final
: a flag indicating whether the dialog is finished or not. (bool)aux_data
: auxiliary data (dictionary type)The auxiliary data of the input is updated in action functions described below, including the ID of the transitioned state. Updates are not necessarily performed in action functions. The transitioned state is added in the following format.
{"state": "I like a particular ramen" }
5.4.1. Block configuration parameters
knowledge_file
(string)Specifies an Excel file describing the scenario. It is a relative path from the directory wherer the configuration file exists.
function_definitions
(string)The name of the module that defines the scenario function (see dictionary_function). If there are multiple modules, connect them with
':'
. The module must be in the Python module search path. (The directory containing the configuration file is in the module search path.)flags_to_use
(list of strings)Same as the Snips Understander.
knowledge_google_sheet
(object)Same as the Snips Understander.
scenario_graph
: (boolean. Default value isFalse
)If this value is
true
, the values in thesystem utterance
anduser utterance example
columns of the scenario sheet are used to create the graph. This allows the scenario writer to intuitively see the state transition network.repeat_when_no_available_transitions
(Boolean. Default value isfalse
)When this value is
true
, if there is no transition that matches the condition, the same utterance is repeated without transition.multi_party
(Boolean. Deafault value isfalse
)When this value is set to
true
, the value ofuser_id
is included in the conversation history for Section 5.4.4.1 and in the prompts for built-in functions using large language models described inllm_functions
.
5.4.2. Dialogue Management Knowledge Description
The dialog management knowledge (scenario) is written in the scenario sheet in the Excel file.
Each row of the sheet represents a transition. Each row consists of the following columns
flag
Same as on the utterances sheet.
state
The name of the source state of the transition.
system utterance
Candidates of the system utterance generated in the
state
state.The
{<variable>}
or{<function call>}
in the system utterance string is replaced by the value assigned to the variable during the dialogue or the return value of the function call. This will be explained in detail in “Variables and Function Calls in System Utterances”.There can be multiple lines with the same
state
, but allsystem utterance
in the lines having the samestate
become system utterance candidates, and will be chosen randomely.user utterance example
Example of user utterance. It is only written to understand the flow of the dialogue, and is not used by the system.
user utterance type
The user utterance type obtained by language understanding. It is used as a condition of the transition.
conditions
Condition (sequence of conditions). A function call that represents a condition for a transition. There can be more than one. If there are multiple conditions, they are concatenated with
';'
. Each condition has the form<function name>(<argument 1>, <argument 2>, ..., <argument n>)
. The number of arguments can be zero. See Function arguments for the arguments that can be used in each condition.actions
A sequece of actions, which are function calls to execute when the transition occurs. If there is more than one, they are concatenated with
;
. Each condition has the form<function name>(<argument 1>, <argument 2>, ..., <argument n>)
. The number of arguments can be zero. See Function arguments for the arguments that can be used in each condition.next state
The name of the destination state of the transition.
There can be other columns on this sheet (for use as notes).
If the user utterance type
of the transition represented by each line is empty or matches the result of language understanding, and if the conditions
are empty or all of them are satisfied, the condition for the transition is satisfied and the transition is made to the next state
state. In this case, the action described in actions
is executed.
Rows with the same state
column (transitions with the same source state) are checked to see if they satisfy the transition conditions, starting with the one written above.
The default transition (a line with both user utterance type
and conditions
columns empty) must be at the bottom of the rows having the state
column values.
Unless repeat_when_no_available_transitions
is True
, the default transition is necessary.
5.4.3. Special status
The following state names are predefined.
#prep
Preparation state. If this state exists, a transition from this state is attempted when the dialogue begins (when the client first accesses). The system checks if all conditions in the conditions column of the row with the
#prep
value in thestate
column are met. If they are, the actions in that row’s actions are executed, then the system transitions to the state in next state, and the system utterance for that state is outputted.This is used to change the initial system utterance and state according to the situation. The Japanese sample application changes the content of the greeting depending on the time of the day when the dialogue takes place.
This state is not necessary.
#initial
Initial state. If there is no
#prep state
, the dialogue starts from this state when it begins (when the client first accesses). The system utterance for this state is placed inoutput_text
and returned to the main process.There must be either
#prep
or#initial
state.#error
Moves to this state when an internal error occurs. Generates a system utterance and exits.
A state ID beginning with
#final
, such as#final_say_bye
, indicates a final state. In a final state, the system generates a system utterance and terminates the dialog.
5.4.4. Conditions and Actions
5.4.4.1. Context information
STN Manager maintains context information for each dialogue session. The context information is a set of variables and their values (python dictionary type data), and the values can be any data structure.
Condition and action functions access context information.
The context information is pre-set with the following key-value pairs.
key |
value |
---|---|
_current_state_name |
name of the state before transition (string) |
_config |
dictionary type data created by reading configuration file |
_block_config |
The part of the dialog management block in the configuration file (dictionary) |
_aux_data |
aux_data (dictionary) received from main process |
_previous_system_utterance |
previous system utterance (string) |
_dialogue_history |
Dialogue history (list) |
_turns_in_state |
The number of user turns in the current state (integer) |
The dialog history is in the following form.
[
{
"speaker": "user",
"utterance": <canonicalized user utterance (string)>
},
{
"speaker": "system",
"utterance": <canonicalized user utterance (string)
},
{
"speaker": "user",
"utterance": <canonicalized user utterance (string)
},
...
]
In addition to these, new key/value pairs can be added within the action function.
5.4.4.2. Function arguments
The arguments of the functions used in conditions and actions are of the following types.
Special variables (strings beginning with
#
)The following types are available
#<slot name>
Slot value of the language understanding result of the previous user utterance (the input
nlu_result
value). If the slot value is empty, it is an empty string.#<key for auxiliary data>
The value of this key in the input
aux_data
. For example, in the case of#emotion
, the value ofaux_data['emotion']
. If this key is missing, it is an empty string.#sentence
Immediate previous user utterance (canonicalized)
#user_id
User ID string
Variables (strings beginning with
*
)The value of a variable in context information. It is in the form
*<variable name>
. The value of a variable must be a string. If the variable is not in the context information, it is an empty string.Variable reference (string beginning with
&
)Refers to a context variable in function definitions. It is in the form
&<context variable name>
Constant (string enclosed in
""
)It means the string as it is.
5.4.5. Variables and Function Calls in System Utterances
In system utterances, parts enclosed in {
and }
are variables or function calls that are replaced by the value of the variable or the return value of the function call.
Variables that start with #
are special variables mentioned above. Other variables are normal variables, which are supposed to be present in the context information. If these variables do not exist, the variable names are used as is without replacement.
For function calls, the functions can take arguments explained above as functions used for conditions or actions. The return value must be a string.
5.4.6. Function Definitions
Functions used in conditions and actions are either built-in to DialBB or defined by the developers.The function used in a condition returns a boolean value, while the function used in an action returns nothing.
5.4.6.1. Built-in functions
The built-in functions are as follows:
Functions used in conditions
_eq(x, y)
Returns
True
ifx
andy
are the same.e.g.,
_eq(*a, "b")
returnsTrue
if the value of variablea
is"b"
._eq(#food, "sandwich")
: returnsTrue
if#food
slot value is"sandwich"
._ne(x, y)
Returns
True
ifx
andy
are not the same.e.g.,
_ne(#food, "ramen")
returnsFalse
if#food
slot is"ramen"
._contains(x, y)
Returns
True
ifx
containsy
as a string.e.g.,
contains(#sentence, "yes")
: returnsTrue
if the user utterance contains “yes”._not_contains(x, y)
Returns
True
ifx
does not containy
as a string.e.g.,
_not_contains(#sentence, "yes")
returnsTrue
if the user utterance contains"yes"
._member_of(x, y)
Returns
True
if the list formed by splittingy
by':'
contains the stringx
.e.g.,
_member_of(#food, "ramen:fried rice:dumplings")
_not_member_of(x, y)
e.g.,
_not_member_of(*favorite_food, "ramen:fried_han:dumpling")
_num_turns_exceeds(n)
Returns
True
when the number of user turns exceeds the integer represented by the stringn
.e.g.:
_num_turns_exceeds("10")
_num_turns_in_state_exceeds(n)
Returns
True
when the number of user turns in the current state exceeds the integer represented by the stringn
.e.g.:
_num_turns_in_state_exceeds("5")
_check_with_llm(task)
Makes the judgment using a large language model. More details follow.
Functions used in actions
_set(x, y)
Sets
y
to the variablex
.e.g.,
_set(&a, b)
: sets the value ofb
toa
._set(&a, "hello")
: sets"hello"
toa
._set(x, y)
Sets
y
to the variablex
.e.g.,
_set(&a, b)
: sets the value ofb
toa
._set(&a, "hello")
: sets"hello"
toa
.
Functions used in system utterances
_generate_with_llm(task)
Generates a string using a large language model (currently only OpenAI’s ChatGPT). More details follow.
5.4.6.2. Built-in functions using large language models
The functions _check_with_llm(task)
and _generate_with_llm(task)
use a large language model (currently only OpenAI’s ChatGPT) along with dialogue history to perform condition checks and text generation. Here are some examples:
Example of a condition check:
_check_with_llm("Please determine if the user said the reason.")
Example of text generation:
_generate_with_llm("Generate a sentence to say it's time to end the talk by continuing the conversation in 50 words.")
To use these functions, the following settings are required:
Set OpenAI’s API key to environment variable
OPENAI_API_KEY
.Please check websites and other resources to find out how to obtain an API key from OpenAI.
Add the following elements to the chatgpt block configuration:
gpt_model
(string)This specifies the model name of GPT, such as
gpt-4o
,gpt-4o-mini
, etc. The default value isgpt-4o-mini
.gpt-4
cannot be used.temperature
(float)This specifies the temperature parameter for GPT. The default value is
0.7
.situation
(list of strings)A list that enumerates the scenarios to be written in the GPT prompt. If this element is absent, no specific situation is specified.
persona
(lis of strings)A list that enumerates the system persona to be written in the GPT prompt.
If this element is absent, no specific persona is specified.
e.g.:
chatgpt: gpt_model: gpt-4-turbo temperature: 0.7 situation: - You are a dialogue system and chatting with the user. - You met the user for the first time. - You and the user are similar in age. - You and the user talk in a friendly manner. persona: - Your name is Yui - 28 years old - Female - You like sweets - You don't drink alcohol - A web designer working for an IT company - Single - You talk very friendly - Diplomatic and cheerful
5.4.6.3. Syntax sugars for built-in functions
Syntax sugars are provided to simplify the description of built-in functions.
<variable name>==<value>
This means
_eq(<variable name>, <value>)
.e.g.:
#favorite_sandwich=="chiken salad sandwich"
<variable name>!=<value>
This means
_ne(<variable name>, <value>)
.e.g.:
#NE_Person!=""
<variable name>=<value>
This means
_set(&<variable name>, <value>)
.e.g.:,
user_name=#NE_Person
$<task string>
When used as a condition, it means
_check_with_llm(<task string>)
, and when used in a system utterance enclosed in{}
, it means_generate_with_llm(<task string>)
.Example of a condition:
$"Please determine if the user said the reason."
Example of a text generation function call in a system utterance
I understand. {$"Generate a sentence to say it's time to end the talk by continuing the conversation in 50 words" } Thank you for your time.
5.4.6.4. Function definitions by the developers
When the developer defines functions, he/she edits a file specified in function_definition
element in the block configuration.
def get_ramen_location(ramen: str, variable: str, context: Dict[str, Any]) -> None:
location:str = ramen_map.get(ramen, "Japan")
context[variable] = location
In addition to the arguments used in the scenario, variable of dictionary type must be added to receive context information.
All arguments used in the scenario must be strings.
In the case of a special variable or variables, the value of the variable is passed as an argument.
In the case of a variable reference, the variable name without the &
’ is passed, and in the case of a constant, the string in ""
is passed.
5.4.7. Reaction
In an action function, setting a string to _reaction
in the context information will prepend that string to the system’s response after the state transition.
For example, if the action function _set(&_reaction, "I agree.")
is executed and the system’s response in the subsequent state is “How was the food?”, then the system will return the response “I agree. How was the food?”.
5.4.8. Continuous Transition
If a transition is made to a state where the first system utterance is $skip
, the next transition is made immediately without returning a system response. This is used in cases where the second transition is selected based on the result of the action of the first transition.
5.4.9. Dealing with Multiple Language Understanding Results
If the input nlu_result
is a list that contains multiple language understanding results, the process is as follows.
Starting from the top of the list, check whether the type
value of a candidate language understanding result is equal to the user utterance type
value of one of the possible transitions from the current state, and use the candidate language understanding result if there is an equal transition. If none of the candidate language comprehension results meet the above conditions, the first language comprehension result in the list is used.
5.4.10. Subdialogue
If the destination state name is of the form #gosub:<state name1>:<state name2>
, it transitions to the state <state name1>
and executes a subdialogue starting there. If the destination state is :exit
, it moves to the state <state name2>
.
For example, if the destination state name is of the form #gosub:request_confirmation:confirmed
, a subdialogue starting with request_confirmatin
is executed, and when the destination state becomes :exit
, it returns to confirmed
. When the destination becomes :exit
, it returns to confirmed
.
It is also possible to transition to a subdialogue within a subdialogue.
5.4.11. Saving Context Information in an External Database
When operating the DialBB application as a web server, using a load balancer to distribute processing across multiple instances can handle request surges efficiently. By saving context information in an external database (MongoDB), a single session can be processed by different instances. (Feature added in version 0.10.0)
To use an external database, specify context_db
element like the following in the block configuration:
context_db:
host: localhost
port: 27017
user: admin
password: password
Each key is defined as follows:
host
(str)The hostname where MongoDB is running.
port
(int, default value:27017
)The port number used to access MongoDB.
user
(str)The username for accessing MongoDB.
password
(str)The password for accessing MongoDB.
5.4.12. Advanced Mechanisms for Handling Speech Input
5.4.12.1. Additional block configuration parameters
input_confidence_threshold
(float; default value1.0
) If the input is a speech recognition result and its confidence is less than this value, the confidence is considered low. The confidence of the input is the value ofconfidence
inaux_data
. If there is noconfidence
key inaux_data
, the confidence is considered high. In the case of low confidence, the process depends on the value of the parameter described below.confirmation_request
(object)This is specified in the following form.
confirmation_request: function_to_generate_utterance: <function name (string)> acknowledgement_utterance_type: <user utterance type name of acknowledgement (string)> denial_utterance_type: <name of user utterance type for affirmation (string)>
If this is specified, the function specified in
function_to_generate_utterance
is executed and the return value is spoken (called a confirmation request utterance), instead of making a state transition when the input is less certain. Then, the next process is performed in response to the user’s utterance.When the confidence level of the user’s utterance is low, the transition is not made and the previous state of utterance is repeated.
If the type of user utterance is specified by
acknowledgement_utterance_type
, the transition is made according to the user utterance before the acknowledgement request utterance.If the type of user utterance is specified by
denial_utterance_type
, no transition is made and the utterance in the original state is repeated.If the user utterance type is other than that, a normal transition is performed.
However, if the input is a barge-in utterance (
aux_data
has abarge_in
element and its value isTrue
), this process is not performed.The function specified by
function_to_generate_utterance
is defined in the module specified byfunction_definitions
in the block configuration. The arguments of the function are thenlu_result
and context information of the block’s input. The return value is a string of the system utterance.utterance_to_ask_repetition
(string)If it is specified, then when the input confidence is low, no state transition is made and the value of this element is taken as the system utterance. However, in the case of barge-in (
aux_data
has abarge_in
element and its value isTrue
), this process is not performed.confirmation_request
andutterance_to_ask_repetition
cannot be specified at the same time.ignore_out_of_context_barge_in
(Boolean; default value isFalse
).If this value is
True
, the input is a barge-in utterance (the value ofbarge_in
in theaux_data
of the request isTrue
), the conditions for a transition other than the default transition are not met (i.e. the input is not expected in the scenario), or the confidence level of the input is low the transition is not made. In this case, thebarge_in_ignored
of the responseaux_data
is set toTrue
.reaction_to_silence
(object)It has an
action
element. The value of theaction
element is a string that can be eitherrepeat
ortransition
. If the value of theaction
element is"transition"
, the"destination"
element is required. The value of thedestination
key is a string.If the input
aux_data
has along_silence
key and its value isTrue
, and if the conditions for a transition other than the default transition are not met, then it behaves as follows, depending on this parameter:If this parameter is not specified, normal state transitions are performed.
If the value of
action
is"repeat"
, the previous system utterance is repeated without state transition.If the value of
action
is"transition"
, then the transition is made to the state specified bydestination
.
5.4.12.2. Adding built-in condition functions
The following built-in condition functions have been added
_confidence_is_low()
Returns
True
if the value ofconfidence
in the inputaux_data
is less than or equal to the value ofinput_confidence_threshold
in the configuration._is_long_silence()
Returns
True
if the value oflong_silence
in the input’saux_data
isTrue
.
5.4.12.3. Ignoring the last incorrect input
If the value of rewind
in the input aux_data
is True
, a transition is made from the state before the last response.
Any changes to the dialog context due to actions taken during the previous response will also be undone.
This function is used when a user utterance is accidentally split in the middle during speech recognition and only the first half of the utterance is responded to.
Note that the context information is reverted, but not if you have changed the value of a global variable in an action function or the contents of an external database.
5.5. ChatGPT Dialogue (ChatGPT-based Dialogue Block)
(Changed in ver0.7)
(dialbb.builtin_blocks.chatgpt.chatgpt.ChatGPT
)
Engages in dialogue using OpenAI’s ChatGPT.
5.5.1. Input/Output
Input
user_utterance
: Input string (string)aux_data
: Auxiliary data (dictionary).user_id
: auxiliary data (dictionary)
Output
system_utterance
: Input string (string)aux_data
: auxiliary data (dictionary type)final
: boolean flag indicating whether the dialog is finished or not.
The inputs aux_data
and user_id
are not used.
The output aux_data
is the same as the input aux_data
and final
is always False
.
When using these blocks, you need to set the OpenAI license key in the environment variable OPENAI_API_KEY
.
5.5.2. Block Configuration Parameters
first_system_utterance
(string, default value is""
)This is the first system utterance of the dialog.
user_name
(string, default value is"User"
)This is used for the ChatGPT prompt. It is explained below.
system_name
(string, default value is"System"
)This is used for the ChatGPT prompt. It is explained below.
prompt_template
(string)This specifies the prompt template file as a relative path from the configuration file directory.
A prompt template is a template of prompts for making ChatGPT generate a system utterance, and it can contain the following variables starting with
@
.@dialogue_history
Dialogue history. This is replaced by a string in the following form:<The value of system_name in the block configuration>: <system utterance> <The value of user_name in the block configuration>: <user utterance> <The value of system_name in the block configuration>: <system utterance> <The value of user_name in the block configuration>: <user utterance> ... <The value of system_name in the block configuration>: <system utterance> <The value of user_name in the block configuration>: <user utterance>
gpt_model
(string, default value isgpt-4o-mini
)Open AI GPT model. You can specify
gpt-4o
,gpt-4o-mini
and so on.
5.5.3. Process Details
At the beginning of the dialog, the value of
first_system_utterance
in the block configuration is returned as system utterance.In the second and subsequent turns, the prompt template in which
@dialogue_history
is replace by the dialogue histor is given to ChatGPT and the returned string is returned as the system utterance.
5.6. ChatGPT NER (Named Entity Recognition Block Using ChatGPT)
(dialbb.builtin_blocks.ner_with_chatgpt.chatgpt_ner.NER
)
This block utilizes OpenAI’s ChatGPT to perform named entity recognition (NER).
If the language
element in the configuration is set to ja
, it extracts named entities in Japanese. If set to en
, it extracts named entities in English.
At startup, this block reads named entity knowledge from an Excel file, converts it into a list of named entity classes, descriptions for each class, examples of named entities in each class, and extraction examples (few-shot examples), and embeds them into the prompt.
During execution, the input utterance is added to the prompt, and ChatGPT is used for named entity extraction.
5.6.1. Input and Output
Input
input_text
: Input stringaux_data
: auxiliary data (dictionary)
Output
aux_data
: Auxiliary data (dictionary format)The named entity extraction results are added to the provided
aux_data
.The extracted named entities follow this format:
{"NE_<Label>": "<Named Entity>", "NE_<Label>": "<Named Entity>", ...}
<Label>
represents the named entity class. The named entity is the recognized phrase found ininput_text
. If multiple entities of the same class are found, they are concatenated with:
.Example:
{"NE_Person": "John:Mary", "NE_Dish": "Chicken Marsala"}
5.6.2. Block Configuration Parameters
knowledge_file
(String)Specifies the Excel file containing named entity knowledge. The file path should be relative to the directory where the configuration file is located.
flags_to_use
(List of strings)If any of these values are present in the
flag
column of each sheet, the corresponding row will be loaded. If this parameter is not set, all rows will be loaded.knowledge_google_sheet
(Hash)Information for using Google Sheets instead of Excel.
sheet_id
(String)The ID of the Google Sheet.
key_file
(String)Specifies the key file for accessing the Google Sheet API. The file path should be relative to the configuration file directory.
gpt_model
(String, default:gpt-4o-mini
)Specifies the ChatGPT model. Options include
gpt-4o
, etc.prompt_template
Specifies the file containing the prompt template, relative to the configuration file directory.
If not specified, the default templates
dialbb.builtin_blocks.ner_with_chatgpt.chatgpt_ner.prompt_template_ja.PROMPT_TEMPLATE_JA
(for Japanese) ordialbb.builtin_blocks.ner_with_chatgpt.chatgpt_ner.prompt_template_en.PROMPT_TEMPLATE_EN
(for English) will be used.The prompt template defines how ChatGPT is instructed for language understanding and includes the following variables (prefixed with
@
):@classes
List of named entity classes.@class_explanations
Descriptions of each named entity class.@ne_examples
Examples of named entities for each class.@ner_examples
Examples of utterances and their correct named entity extraction results (few-shot examples).@input
The input utterance.
Values are assigned to these variables at runtime.
5.6.3. Named Entity Knowledge
Named entity knowledge consists of the following two sheets:
Sheet Name |
Description |
---|---|
utterances |
Examples of utterances and named entity extraction results. |
classes |
Relationship between slots and entities, along with a list of synonyms. |
Although the sheet names can be changed in the block configuration, this is rarely needed, so detailed explanations are omitted.
5.6.3.1. utterances Sheet
Each row consists of the following columns:
flag
A flag to determine whether to use the row. Common values include
Y
(yes) andT
(test). The configuration specifies which flags to use.utterance
Example utterance.
entities
Named entities contained in the utterance. They are formatted as follows:
<Named Entity Class>=<Named Entity>, <Named Entity Class>=<Named Entity>, ... <Named Entity Class>=<Named Entity>
Example:
Person=John, Location=Chicago
Additional columns besides these are allowed in the sheets used by this block.
5.6.3.2. classes Sheet
Each row consists of the following columns:
flag
Same as in the
utterances
sheet.class
Named entity class name.
explanation
Description of the named entity class.
examples
Examples of named entities, concatenated with
','
.
5.7. spaCy-Based NER (Named Entity Recognizer Block using spaCy)
(dialbb.builtin_blocks.ner_with_spacy.ne_recognizer.SpaCyNER
)
Performs named entity recognition using spaCy and GiNZA.
5.7.1. Input/Output
Input
input_text
: Input string (string)aux_data
: auxiliary data (dictionary)
Output
aux_data
: auxiliary data (dictionary)The inputted
aux_data
plus the named entity recognition results.
The result of named entity recognition is as follows.
{
"NE_<label>": "<named entity>",
"NE_<label>": "<named entity>",
...
}
<label>
is the class of named entities. <named entity>
is a found named entity, a substring of ``input_text. If multiple named entities of the same class are found, they are concatenated with
‘:’`.
Example:
{
"NE_Person": "John:Mary",
"NE_Dish": "Chiken Marsala"
}
See the spaCy/GiNZA model website for more information on the class of named entities.
ja-ginza-electra
(5.1.2): https://pypi.org/project/ja-ginza-electra/en_core_web_trf
(3.5.0): https://spacy.io/models/en#en_core_web_trf-labels
5.7.2. Block Configuration Parameters
model
(String: Required)The name of the spaCy/GiNZA model. It can be
ja_ginza_electra
(Japanese),en_core_web_trf
(English), etc.patterns
(object; Optional)Describes a rule-based named entity extraction pattern. The pattern is a YAML format of the one described in spaCy Pattern Description.
The following is an example.
patterns: - label: Date pattern: yesterday - label: Date pattern: The day before yesterday
5.7.3. Process Details
Extracts the named entities in input_text
using spaCy/GiNZA and returns the result in aux_data
.