Building (and integrating) ChatBot using govCMS (SaaS), NodeJS and Dialogflow

It all started around 5/6 months ago when we got an email from the govCMS team for our feedback on how govCMS team can improve their help/support section regarding govCMS in general. I still remember De’leon and I talked about our very positive feedback (in one of our tea/coffee breaks) internally and that break/talk eventually turned (as happens most of the times) into a brainstorming session.

TL;DR (just read the title, and watch the demo video)

End user behaviour and background to this POC

Long story short, we identified a few common patterns regarding govCMS topics/questions that our stakeholders ask (the ones you would expect from most stakeholders). Depending on the roles, the query varied/varies from can I get this ‘x’ web feature/component in govCMS to can I integrate this ‘y’ service to how can I do this ‘z’ task in govCMS. Sometimes these queries really are very vague in nature and mostly depend on the theme/component or framework you build (or are going to provide) for your stakeholders (on top of the basic govCMS framework) and I am sure you can already guess why simply Googling is not enough for intended answers due to this vagueness. To tackle most of these obvious questions about end users’ daily routine tasks that one need to do in govCMS (or any other CMS really), we created a CMS training area for our clients which has step-by-stepSOP and screen-cast bundled as a Knowledge Base system (funny enough it’s built on govCMS). This KB/help-system works almost fine but every now and then we still get similar queries over MS Lync (from stakeholders/team-mates) our integrated communication tool. And this behaviour (queries/conversations over Lync) is not only limited to govCMS, but ranges from SharePoint, ASP/HTML framework, general JS/jQuery, integration with existing forms etc. and the audiences are also very diverse, spanning from multi OU and Departments (and sections) having different roles (and I am sure you know what it is like).

If you only want to see our implementation skip the text below and jump to the video. I promise (hope) that would be less boring than these texts 😉

I do get/understand this approach (way of communication). It’s like asking your team mates about (let’s say) your internal framework class (or a bootstrap class) name before you actually ask Google, or see the reference site or your user guide site. We all do that and I have a feeling we do that deliberately. It’s more engaging this way than typing into Google to see flood of information and then sort/find the one you need (and sometimes you get stuck in trial and error maze before you land on the one you were actually looking for).

Now that brainstorming session along with this repetitive Lync conversation behaviour (that I just mentioned above) triggered this idea.

Why not create a Bot and integrate that with our KB system.

We can then integrate the whole thing in the KB website and our users can ask the same questions and with a bit of tweaking in our existing KB IA/Design the Bot can answer rather accurately or show the correct path to get more closely related information (similar behaviour of what we do over Lync when we respond, only this time it’s by a Bot and not Me or De’leon).

Now that you know the background, before going any further, let’s see the Bot in action and then it will probably make more sense (see the video below).

Demo

 

Please note this is a POC project that was built in our free time (so expect a bit of crudeness).

 

As you can see the Bot can respond to the following ways:

  • Step by step guide
  • Video
  • URL link
  • SOP
  • Layout (details and suggestions)
  • Code repository (like private gist in github)
  • User Guide
  • Component etc.

A conversational example

Just to illustrate how the conversation works, here is an example. You can ask like,

  • can you show me some layouts?

And the Bot would reply

  • Which framework

As there are multiple frameworks (e.g. SharePoint) once you say which framework it can fetch the layouts for that framework and ask further

  • Any particular layouts or all (three columns, two columns)

Once you say which one you are after (e.g. 2 columns) it can show you details of the available two columns layouts for that framework

Or you could ask for a code example, or class name or a video tutorial and it would try to respond accordingly (if there is data in the KB system)

Please note: this list can increase really very quickly if the design is not done properly but then again you are only limited by your design.

As you have seen in the video above, it can cover all these queries.

So how does this work?

Well you probably already have a basic (or advanced) idea about how a typical Bot works (as they say 2018 is the net year for Bots). You also probably heard about the big providers (Google, FaceBook, Microsoft etc). We have used Dialogflow (no particular reason) for our POC. In very broad sense requests in a typical Bot in Google’s Dialoglflow framework involve these steps:

  1. User asks question (in our case through a chat window embedded in HTML in the web browser)
  2. The requested text gets parsed by JS (integrated in the chat window) and sent to Dialoglflow (DF)
  3. DF selects suitable intents (that we defined beforehand) and invokes fulfilment (FF) if required.
  4. If Web-hook (WH) is provided for the FFDF invokes that WH
  5. DF does a POST request (defined in its FF section) to the service URL (in our case its a node server)
  6. Our node server (NS) extracts the incoming info (passed by DF) and do required data operation on the server (as required) and reply back to DF with the result.
  7. DF evaluates the response, validates and send the final reply back to the JS (our browser where the chat window resides)
  8. JS updates the chat window and renders the reply
  9. DF is ready for next request.

That was a very very general explanation of the data flow (a lot happens under the hood). But for simplicity’s sake, keeping that in mind let’s expand this process a bit more as for our scenario we actually need to do a bit more data manipulation (that has dependency on API call) before we send our reply to DF. Like we mentioned before we want to integrate govCMS as our content service and for that this flow can be extended as shown below, again do note, very superficial explanation (please excuse my rough sketch)

Lets identify the steps

(you can see steps 1-4 are exactly the same)

  1. User asks question (in our case through a chat window embedded in HTML in the web browser)
  2. The requested text gets parsed by JS (integrated in the chat window) and sent to Dialoglflow (DF)
  3. DF selects suitable intents (that we defined beforehand) and invokes fulfilment (FF) if required.
  4. If Web-hook (WH) is provided for the FFDF invokes that WH
  5. DF does a POST request defined in its fulfilment section (FF) to the service URL (in our case its a node server)
  6. Our node server (NS) extracts info passed by DF (intents/actions and required parameters) but this time it requires more info based on the parameters it received to get back to DF
  7. NS restructures the parameters and call an API (this is our govCMS API) to get required information
  8. Our govCMS (D7) responds to the API call and spits out json data back to NS
  9. NS validates the data and restructure the json data to suitable format for DF and sends it back to DF
  10. DF evaluates the response, validates and send the final reply back to the JS of the WB. (please note for the POC demo we are not validating the data)
  11. JS updates the chat window and renders the reply

Are you confused yet? No?, well let me try again ;).

Demo

See the video (and again excuse my rough sketch) that shows the flow

Boring technical bits

It’s actually easier than you think. Hardest bit was (and will be) to come up with a good and extendable schema for your Knowledge Base (KB) entity. This is what we have done.

  • Define types of queries the end user can do on KB
  • Define enough intents and related contexts (parameters) to identify and cover wide aspects of KB items from a general user.
  • Finally define fields for KB content type (to support above two clauses) and make it flexible for future changes
  • Create APIs that uses those parameters and refine fields to produce structured json data
  • Get your choice of Bot framework provider (we used Dialoglflow in this POC) and recreate the contexts/parameters identified while creating related intents
  • Add training texts and provide responses
  • Mark mandatory/required parameters that are required to make the API call later on
  • Include Welcome/Small Talk (prebuilt) agent to handle the usual chit chat (conversation starter)
  • Test training and fix if the contexts are not picked up properly
  • Use Node.JS to handle Dialoglflow (DF)’s fulfillment POST for webhook (I mean we could use PHP to handle that, but we wanted to test out nodejs)
  • In Node.JS extract the required parameter(s) we need by catching the correct intent/action (simple if/else or switch)
  • Construct API request with the received parameters to filter/refine the result we want
  • Reformat the returned result in Node.JS and return it to DF
  • Once received data from DF in JS in our site before rendering reformat the data in HTML form so that we can have rich text output
  • Append the HTML to the chat window
  • Finally use css to style the appearance of the chat window

Looking forward

Now what’s the big deal about this?

Well, for this specific POC to be honest, not much really. It was more like to see an idea in action (after all it’s a POC, right?). But this was an interesting POC to say the least for couple of reasons

  • We wanted to use govCMS SaaS only (so no extra modules) distribution.
  • We wanted to build a system where the end user can populate the content so for any future KB item they can manage internally.
  • We wanted to use rich messages (but without tapping into 3rd party conversational/messaging platform i.e. slack or FB Messenger) in our website
  • We also wanted to inject our custom response before the message gets rendered (other than the provided rich messages)

But it can be a big deal if you can extend this properly for your end users (and hopefully we can). And when I say proper I mean the information your users are interested about specific to your Org, backed by user interaction data provided by analytics like GAor behaviour data provided by tracker likehotjar.

You can serve and control these info from a centralised single provider or multiple (different base APIs from different systems/providers) depending on how those systems are implemented. Bottom line is, it doesn’t matter, really as related services from both scenarios can be integrated/controlled in your NodeJS server. Couple of services that you can extend to come to mind straight away (see below).

  • Forms (leave, feedback etc.)
  • Procedures / Manuals (SOP) directory
  • Booking system (Carpooling etc.)
  • Shuttle service (and many similar common services)
  • People Directory system
  • Self serving kiosk (for general info) etc.

Lets not get carried away

Now back to where we started from. Using Lync as conversational medium for tuned and accurate results. And here is the interesting bit,

you can use Skype (or Skype for business) and can implement the same conversational behaviour (remember we talked about communicating through Lync?).

Yes, you will need to redefine the rich message section and it will work almost exactly the same way.

Wrapping up

Thanks for staying with me up to this. Let’s finish this article considering two more scenarios.

  • You ask the Bot you want to create a piece of content, the bot replies back to choose one from the allowed content types. Once chosen it then gives you option to select which layout you want to go for (again from allowed layouts for the selected content type)? Once chosen it asks you if you want any other components to be added. It may also give you option to choose different variant of the components (you know, views) Once that bit is done finally it can preview you what the content will look like using pre-built templates. All without typing a single line of code. Now, wouldn’t that be nice.

What I found that a visual with actual content rendered in actual layout(s) helps to mitigate gaps between different stakeholders during UI finalisation phase and helps to manage/set end user expectations.

  • Now for the second scenario (and you probably have seen/user this in Mobile apps), in hand held devices, for end users, provide option to search using voice (with integrated bot) so user can search by saying show me the topic “xyz” from last week or when is the event “abc” is happening?

But I am gonna pause here, that’s for a different day.

Let me know your thoughts!

Leave a Reply

Your email address will not be published. Required fields are marked *