Data and Technology: User traversal digraphs

Visualizing the users' traversal on a website or path traversed in an application or nodes visited may need to be analyzed for better user experience or improving the efficiency the path traversed or to improve the conversion rate by getting the user to the end state. For example, it may be that users visiting one set of nodes (node meaning particular state of application like tutorial state machine) convert better than users going through different set of states. There are many cases where a graph or directed graph is a necessity.

You can use freeware tools like ‘dot’ to draw directed graphs. In the following example, I assume the state/node is represented by simple integer and path taken by the edge (directed line).

Here, I have shown a simplified version of a project I worked on in quickly generating a digraph that can be sent or uploaded to site for internal business users on regular intervals in real time. Few assumptions are made to help explain the main idea and they are - each visit to the node is timestamped and user can only traverse forward (meaning visit to the node number higher than the current one). To start with data is summarized in a table with each user or device or some unique key identifier, node visited, and time of visit.

Order the data in chronological order for each unique id so that by doing the self-join (see code snippet below) in SQL we can simply find out the next node the user visited.

SELECT T1.node_id current_node, T2.node_id next_node, count(1) ct
FROM table T1 JOIN table T2 ON T1.row_id+1 = T2.row_id
WHERE T1.unique_id = T2.unique_id # say, user_id
GROUP BY T1.node_id, T2.node_id

You can also provide weightage/cost to the edge by normalizing the count which will result in set of rows similar to
Current_node, next_node, count, count/max_count

This is all the data we need to generate input file for ‘dot’ application. Write a program that takes the above input and dump it into a file with content –

Digraph G {
# splines=false;
rankdir=LR

2 ->3 [penwidth=4.9, label="1190"]
3 -> 4 [penwidth=4.9, label="1150"]
4 -> 5 [penwidth=4.8]
5 -> 6 [penwidth=4.8]
6 -> 7 [penwidth=4.8]
7 -> 8 [penwidth=4.8]
…
…
}

By providing this as input you can generate the output in multiple formats including pdf, ps, etc. See graph 1 below. You can provide more input parameters in the file to fancy the graph or with more information like drop-off (% of users dropped) between states, see graph 2. In essence you are generating a digraph to visualize the data in a more meaningful way.

Digraph 1 - with sample weights between states 2 & 3, 3 & 4

With dot input file content like

subgraph cluster1 {
rankdir="LR"
2 -> 3 [penwidth="23", label="23.48%", taillabel="4450", headlabel="3405"]
3 -> 4 [penwidth="25", label="24.9%", taillabel="3405", headlabel="2557"]
4 -> 5 [penwidth="18", label="18.34%", taillabel="2557", headlabel="2088"]
5 -> 6 [penwidth="19", label="19.3%", taillabel="2088", headlabel="1685"]
6 -> 7 [penwidth="20", label="20.18%", taillabel="1685", headlabel="1345"]
7 -> 8 [penwidth="26", label="26.47%", taillabel="1345", headlabel="989"]
8 -> 9 [penwidth="35", label="35.29%", taillabel="989", headlabel="640"]
9 -> 10 [penwidth="39", label="38.59%", taillabel="640", headlabel="393"]
10 -> 11 [penwidth="36", label="35.88%", taillabel="393", headlabel="252"]
}

Digraph 2 - with users drop-off between states in %

User traversal digraphs

No comments:

Post a Comment