Sessionization

Sessionize logs by using the time interval.

Description

Sessionization is a common analytic operation in log analysis. Given an input pageview table, where each row records a webpage visit made by a particular user (or IP address), the sessionization operation identifies user’s web browsing sessions from the recorded visits, by grouping the visits from each user based on the time-intervals between the visits.

Conceptually, if two visis from the same user are made too far apart in time (as defined by a time-out threshold), they will be treated as coming from two browsing sessions.

Treasure Data's Hive engine has TD_SESSIONIZE() function to make you sessionize really easily. The example below are using 1 hour to split the sessions.

Sample Input

access
time user_id path
1416435585 3 http://docs.treasuredata.com/
1416435586 3 http://www.treasuredata.com/
1416435587 5 http://www.treasuredata.com/products/
1416435588 1 http://www.treasuredata.com/about/
1416435598 1 http://www.treasuredata.com/products/
1416435628 1 http://www.treasuredata.com/team/
... ... ...

Query


to view this query

Sample Output

session_id time user_id path
1 1416435585 2 http://docs.treasuredata.com/
1 1416435586 3 http://www.treasuredata.com/
2 1416435587 5 http://www.treasuredata.com/products/
3 1416435588 1 http://www.treasuredata.com/about/
4 1416435598 1 http://www.treasuredata.com/products/
4 1416435628 1 http://www.treasuredata.com/team/
... ... ... ...