Finding stale user accounts using Chronicle Data Lake

Chris Martin (@thatsiemguy)
2 min readJan 3, 2023

Q. Can you use Chronicle SIEM to report upon stale user accounts? (accounts that are active, but no one has logged into recently)

A. Yes, and it’s a neat example of when to use the Chronicle Data Lake (aka BigQuery), and Entity Graph.

Cleaning up old accounts, an ideal task for automation

Stale user SQL Statement

Using your BigQuery client of choice, here’s a SQL statement that uses a CTE to query:

  1. user_accounts — all known accounts in the Chronicle Entity Graph, which includes details of the User Entities creation date, and authentication status, as observed in the last 30 days
  2. user_activity — all observed user authentication activity from Chronicle Events, as observed also in the last 30 days
WITH
user_accounts AS (
SELECT
email AS user,
entity.user.user_authentication_status AS status,
MIN(TIMESTAMP_SECONDS(entity.user.attribute.creation_time.seconds)) AS creation_date
FROM
`datalake.entity_graph`,
UNNEST(entity.user.email_addresses) AS email
WHERE
DATE(_PARTITIONTIME) >= DATE_SUB(CURRENT_DATE(),INTERVAL 1 MONTH)
AND metadata.vendor_name = "Google"
AND metadata.product_name = "Workspace Users"
GROUP BY
1,
2
ORDER BY
3 ASC ),
user_activity AS (
SELECT
email AS user,
MAX(TIMESTAMP_SECONDS(metadata.event_timestamp.seconds)) AS last_login
FROM
`datalake.events`,
UNNEST(target.user.email_addresses) AS email
WHERE
DATE(hour_time_bucket) > DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
AND metadata.event_type = 15001
GROUP BY
1 )
SELECT
user_accounts.user,
user_accounts.status,
user_accounts.creation_date,
user_activity.last_login,
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(),user_activity.last_login,DAY) AS last_logged_in_days_ago
FROM
user_accounts
LEFT JOIN
user_activity
ON
user_activity.user = user_accounts.user
ORDER BY
4 DESC

And the results:

| user                      | status | creation_date                  | last_login                     | last_logged_in_days_ago |
|---------------------------|--------|--------------------------------|--------------------------------|-------------------------|
| thatsiemguy@altostrat.com | 1 | 2021-09-22 08:08:29.000000 UTC | 2023-01-02 19:54:13.000000 UTC | 0 |
| alice@altostrat.com | 1 | 2022-02-04 14:01:30.000000 UTC | 2022-12-23 13:39:46.000000 UTC | 10 |
| chuck@altostrat.com | 1 | 2022-08-18 15:47:59.000000 UTC | | |

The results show:

  • the User’s creation date
    - this will come from your source of truth, e.g., Cloud Identity, Azure AD, Okta
    - if you have multiple Context Sources you may need update the SQL statement WHERE clause to limit which source is queried (as per the above example)
  • the User’s last login
    - this will evaluate all log sources, apply a WHERE clause if you want to check specific authentication sources
    - UDM validation requires a User in Target, hence we can use target.user.email_addresses
    - a caveat would be if your IDP doesn’t use email addresses you would need refactor the SQL statement to use user.userid

Summary

Taking this forward you may need to tune as required, and then manually review to validate results are accurate in your environment. Taking things a step forward, this is a useful example of using Chronicle Data Lake as the basis of part of a SOAR automation, e.g., run the above on a schedule and contact the user in question before deactivation.

--

--