Data Warehousing for Google Analytics Data


How it works

Needl uses the Google Analytics API to extract and enhance your data and then store it in BigQuery - Google's cloud data warehousing solution. An initialization will go back in time to build up your data and then make daily updates.

The Google Analytics API limits the number of dimensions requested in a single query, Needl overcomes this by sending off multiple queries and stitching the results together.

BigQuery connects well with analytics tools such as Data Studio and Google’s ML platform. Give your data a proper home!


Features:

  • Session level data rather than aggregated - daily updates.
  • 25 dimensions as standard - see typical schema below.
  • Sampling can be avoided - run reports across date ranges and as long as there is less than 500K sessions per day, no sampling involved (limit may be lifted in future versions).
  • Bucketed dimensions to aid analytics:
    • device = deviceCategory + operatingSystem + browser
    • daypart = combined dayOfWeek and hour
  • "Has visited" dimensions - has the user visited a url or zone (fuzzy matched page path) during a session.
  • Add in your own data - if you can key your data by say location and time, you should be able to join with the extracted data
  • Support for client id if harvested and fed back into GA via custom dimension (how to)
  • Access to further dimensions if client id used - client id allows us to pull more data from GA, including custom dimensions.


See it in action

Example dataset in BigQuery:

https://console.cloud.google.com/bigquery?project=needlbq&p=needlbq&d=public&t=demo&page=table


Typical Schema

Field name Type Field name Type
partitionDate DATE user RECORD
time RECORD user. userType STRING
time. utcDateTime TIMESTAMP user. sessionCount INTEGER
time. dateTime TIMESTAMP user. sessionFrequency STRING
time. hour INTEGER user. sinceSessionBucket STRING
time. dayOfWeekName STRING user. weeksSinceLastSession INTEGER
time. workingDay STRING user. daysSinceLastSession INTEGER
time. dayPart STRING device RECORD
geoNetwork RECORD device. osMajorVersion STRING
geoNetwork. cityId STRING device. deviceCategory STRING
geoNetwork. city STRING device. operatingSystem STRING
geoNetwork. region STRING device. browser STRING
geoNetwork. metro STRING device. operatingSystemVersion STRING
geoNetwork. metroRegion STRING totals RECORD
geoNetwork. country STRING totals. sessions INTEGER
geoNetwork. longitude FLOAT totals. bounces INTEGER
geoNetwork. latitude FLOAT totals. sessionDuration FLOAT
trafficSource RECORD totals. transactions INTEGER
trafficSource. source STRING totals. transactionRevenue FLOAT
trafficSource. medium STRING totals. goal1Completions INTEGER
trafficSource. trafficType STRING totals. goal2Completions INTEGER

Book a demo for free!