Page Value in GA4

Contents
    Add a header to begin generating the table of contents

    GA4 doesn’t have a metric that is comparable to the Page Value metric from Universal Analytics. In this post, we will recreate the metric using GA4 data in Google BigQuery. 

    Google defined the UA metric as:

    Page Value is the average value for a page that a user visited before landing on the goal page or completing an Ecommerce transaction (or both). This value is intended to give you an idea of which page in your site contributed more to your site’s revenue. If the page wasn’t involved in an ecommerce transaction for your website in any way, then the Page Value for that page will be $0 since the page was never visited in a session where a transaction occurred.

    — from How Page Value is calculated

    GA4 & BigQuery

    The free integration with BigQuery is one of the things that got me excited about GA4 in the early days after it’s release. I love playing with data and it seemed like the ultimate playground. But actually making GA4 data useful in BigQuery was a much heavier lift than I expected. If you are in the same place as I was, be patient and do the work. I’m only part way to where I want to be, but I am starting to feel unbound by the constraints GA4, UA or any other tool impose on how I analyze and use website visitor data.

    A couple of resources I can’t recommend too highly for learning the ins-and-outs of GA4 data in BigQuery:

    1. GA4BigQuery.com – seriously, if there was a nobel prize for analytics, it would go to Johan van de Werken for creating this site.
    2. And Johan teamed up with Simo Ahava to create the online course: Query GA4 Data in Google BigQuery. The course is 600€, but well worth it in my opinion. You will understand GA4 metrics and dimensions a lot better after the course, as well as how to recreate them in BigQuery. I also thought I was pretty competent with Standard SQL before I took it, but I got a lot more comfortable with window functions, scalar subqueries and other crafty SQL techniques.
    I have also started operationalizing my queries in Google Dataform. If you work with multiple GA4 properties, making queries reusable and keeping track of all of your transformations is challenging in BigQuery alone. Dataform makes the process a lot more streamlined. dbt is a similar and also good option.

    The Page Value Query

    This query is actually a series of subqueries that work together to produce the result we are looking for. I’ve separated out each subquery below so I can explain what is happening along the way. These subqueries won’t work by themselves, so you can copy the full query from here

    Keep in mind that analysis in BigQuery does incur costs, though in my experience they tend to be pennies per month unless I am working with very large datasets.

    The ‘with’ clause below is a way to organize subqueries for readability. I could have glommed them all together into one statement with a bunch of inline subqueries – the syntax ‘with x as (query)’ does exactly the same thing. When done this way, you can reference ‘x’ as a table in subsequent queries.

    The first subquery is just a sneaky trick for setting a date range in one place versus having to update it in multiple places.

    				
    					with dates as (
      select 
        '20230812' as start_date,
        -- the next line gets yesterday
        format_date('%Y%m%d',date_sub(current_date(), interval 1 day)) as end_date
    ),
    
    
    
    
    
    
    
    				
    			

    The query below gets all events with associated revenue and timestamps. Note that it is getting value in USD. The ecommerce.purchase_revenue field stores revenue in the GA4 property default currency, but I chose to use event_value_in_usd instead, because it also aggregates non-ecommerce conversion values. 

    I am concatenating user_pseudo_id and ga_session_id to get session_id. This is because ga_session_id is really just a timestamp, so not necessarily unique to a user. Weird.

    Make sure to change the table reference to your table. The syntax ‘events_*’ followed by ‘where _table_suffix …’ is because the GA4 events table is date-partitioned. Each partition reference ends in ‘_YYYYMMDD’. Using ‘_table_suffix’ in the where clause enables you to query across partitions.

    				
    					p as (
      select
        concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
        event_timestamp,
        sum(event_value_in_usd) as event_value
      from `your-gcp-project.your-ga4-dataset_123456.events_*`
      where _table_suffix between (select start_date from dates) and (select end_date from dates)
      and event_value_in_usd is not null
      group by session_id, event_timestamp),
    				
    			

    The next query gets all page views with timestamps.

    				
    					q as (
      select 
        concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as session_id,
        event_timestamp,
        (select value.string_value from unnest(event_params) where key = 'page_location') as page_location
      from `your-gcp-project.your-ga4-dataset_123456.events_*`
      where _table_suffix between (select start_date from dates) and (select end_date from dates)
      and event_name = 'page_view'
    ),
    				
    			

    In the following subquery I am doing several odd/tricky things:

    • I am creating a pseudo_session_id which concatenates the session_id with the timestamp of each revenue event. This is to account for the fact that multiple revenue events can happen in the same session, and when they do we only want to assign value to page views that happened before each event. Since we are only keeping track of sessions as a means to an end, I’m breaking a single session into multiple pseudo-sessions when there are multiple revenue events.
    • For the previous logic to work, I have to do a full outer join between the revenue events table and the page views table. The full outer join results in records for every combination of page views and revenue events for a given session_id.
    • I am assigning all event_value to each page view event. This seems like a mistake, but I’ll come back in the next query and divide that number by a count of the number of page views that led up to the revenue event.
    • I keep track of whether or not page views happened before each revenue event with the is_before_revenue_event field. This is how I make sure to only assign value to page views that happen before each revenue event. 
    				
    					r as (
      select 
      concat(q.session_id,p.event_timestamp) as pseudo_session_id,
      p.event_value,
      q.page_location,
      (case when (q.event_timestamp < p.event_timestamp)  then true else false end) as is_before_revenue_event
    from q
    full outer join p on q.session_id = p.session_id
    order by q.session_id),
    				
    			

    The next query is where I divvy up event value for each page view. I accomplish this with a window function. Window functions are too big a topic to fully explain here, but the basic idea is that a window function can do operations on multiple rows in a dataset in contrast with a standard select statement which operates row-by-row.

    				
    					s as (
      select 
      pseudo_session_id,
      page_location,
      event_value / (count(page_location) over (partition by pseudo_session_id)) as page_revenue
    from r
    where is_before_revenue_event),
    				
    			

    We’re getting close now. This query just sums up revenue by page. At this point, this is just an accumulation of page value. We need to take into account how many times each page has been viewed overall to replicate the Universal Analytics Page Value metric. We will do that in the next query.

    				
    					t as (
      select 
      page_location,
      sum(page_revenue) as page_revenue
    from s
    group by page_location),
    				
    			

    This query gets a count of all page views by page_location. We’ll use this as the denominator for page_value in the last query.

    				
    					u as (select 
      page_location,
      count(page_location) as views
    from q
    group by page_location)
    				
    			

    Bringing it all together, we divide the sum of value for each page by the total number of times the page was viewed to get page_value. Boom!

    				
    					select 
      u.page_location,
      u.views,
      t.page_revenue/u.views as page_value
    from u
    left join t on u.page_location = t.page_location
    order by views desc
    				
    			

    A few things to know/keep in mind:

    • There can be circumstances where a session has a conversion but no page views precede the conversion, so the sum of page value may be less than total conversion value. An example is when a user waits more than 30 minutes before clicking a submit button. GA4 will create a new ga_session_id when more than 30 minutes has passed since the last session event.
    • If a page_view event is also counted as a revenue event, it will not get assigned any page value. In other words, for a page view to be counted, it has to precede the revenue event. In my opinion this makes sense, but you can change the ‘<‘ in query ‘r’ to a ‘<=’ if you want to assign page value to page views that are revenue events.

    Other things we could do with this query

    One of the things I love about working with GA4 data in BigQuery is that once you have figured out how to replicate a GA4 dimension or metric (or a metric from UA in this case), you can tweak your query to better suit your needs. Here are a few things I’ve done or thought about doing with this query:

    • Filter out pages that shouldn’t be accruing page value. For example, checkout pages aren’t really contributing to the likelihood of conversion, they are just necessary steps along the way.
    • Assigning value to other events in addition to or instead of page_views. For example, if you have a video-rich site, you could replace the page_view event with video_start and the page_location parameter with video_url.
    • Weighting the assignment of page value based on time sequence/recency. For example, a good argument could be made that the last page viewed before conversion deserves more credit than previous pages. Doing this would take some window-function fanciness in subquery ‘s’ above, but it would be fun to try.
    • With a simple tweak to subquery ‘p’ above, you could count each conversion as having a value of ‘1’ instead of using the event_value_in_usd field. This would result in a page value metric that works for properties that don’t focus on ecommerce revenue or set conversion values. Essentially, it would be a way of measuring how much influence  a given page has on any downstream conversion.
    • Calculate page value by day and append to a table that stores it over time. This would prevent having to run the full query every time. If you did it this way, it would make sense to store page_revenue from query ‘t’ and then calculate page value as sum(page_revenue)/sum(views) when you need it.

    Reporting on Page Value in Looker Studio

    So far we’ve gone over how to generate a page value metric, but you probably also want to incorporate it into your reporting. If you are proficient in BigQuery, you probably have ideas of how you can do that, but if not, Looker Studio is a great option. The BigQuery connector in Looker Studio has a CUSTOM QUERY option that allows you to execute this query directly from Looker Studio. To use it:

    1. Add a BigQuery data source and select CUSTOM QUERY
    2. Pick the project that includes your GA4 dataset
    3. Copy the full query from here
    4. Update the table references in subqueries ‘p’ and ‘q’ to point to your GA4 table
    5. Select ‘Enable date range parameters’ below the custom query input box
    6. Edit the ‘dates’ subquery to match the query below
    7. Save your connection and you are ready to report on Page value in Looker Studio! 
    				
    					with dates as (
      select 
        @DS_START_DATE as start_date,
    	@DS_END_DATE as end_date
    ),
    				
    			

    What’s great about this approach is that you can update the dates with a standard Looker Studio date range control. But also keep in mind that this will run the query every time you view the data. If you have a high volume of GA4 events or your dashboard gets a lot of views, this method could incur some BigQuery costs.

    gold lines expanding outward

    Learn more about GA4 reporting in Looker Studio in our tutorials, including how to report on individual GA4 conversions in Looker Studio and how to create a GA4 scroll tracking report in Looker Studio

    Two Octobers upskills digital marketers of all skill levels. Check out our GA4 and Looker Studio small-group training program.

    Don't Miss a Beat

    Marketing analytics insights, delivered to you.

    Two monthly emails featuring our latest guides and discoveries.

    have you registered?

    Our next free digital marketing seminar is coming soon!

    [MEC id="946"]