In BigQuery, an array is an ordered list consisting of zero or more values of the same data type. You can construct arrays of simple data types, such as INT64
, and complex data types, such as STRUCT
s. The current exception to this is the ARRAY
data type because arrays of arrays are not supported. To learn more about the ARRAY
data type, including NULL
handling, see Array type.
With BigQuery, you can construct array literals, build arrays from subqueries using the ARRAY
function, and aggregate values into an array using the ARRAY_AGG
function.
You can combine arrays using functions like ARRAY_CONCAT()
, and convert arrays to strings using ARRAY_TO_STRING()
.
Constructing arrays
Using array literals
You can build an array literal in BigQuery using brackets ([
and ]
). Each element in an array is separated by a comma.
SELECT [1, 2, 3] as numbers; SELECT ["apple", "pear", "orange"] as fruit; SELECT [true, false, true] as booleans;
You can also create arrays from any expressions that have compatible types. For example:
SELECT [a, b, c] FROM (SELECT 5 AS a, 37 AS b, 406 AS c); SELECT [a, b, c] FROM (SELECT CAST(5 AS INT64) AS a, CAST(37 AS FLOAT64) AS b, 406 AS c);
Notice that the second example contains three expressions: one that returns an INT64
, one that returns a FLOAT64
, and one that declares a literal. This expression works because all three expressions share FLOAT64
as a supertype.
To declare a specific data type for an array, use angle brackets (<
and >
). For example:
SELECT ARRAY<FLOAT64>[1, 2, 3] as floats;
Arrays of most data types, such as INT64
or STRING
, don’t require that you declare them first.
SELECT [1, 2, 3] as numbers;
You can write an empty array of a specific type using ARRAY<type>[]
. You can also write an untyped empty array using []
, in which case BigQuery attempts to infer the array type from the surrounding context. If BigQuery cannot infer a type, the default type ARRAY<INT64>
is used.
Using generated values
You can also construct an ARRAY
with generated values.
Generating arrays of integers
GENERATE_ARRAY
generates an array of values from a starting and ending value and a step value. For example, the following query generates an array that contains all of the odd integers from 11 to 33, inclusive:
SELECT GENERATE_ARRAY(11, 33, 2) AS odds;
+--------------------------------------------------+ | odds | +--------------------------------------------------+ | [11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33] | +--------------------------------------------------+
You can also generate an array of values in descending order by giving a negative step value:
SELECT GENERATE_ARRAY(21, 14, -1) AS countdown;
+----------------------------------+ | countdown | +----------------------------------+ | [21, 20, 19, 18, 17, 16, 15, 14] | +----------------------------------+
Generating arrays of dates
GENERATE_DATE_ARRAY
generates an array of DATE
s from a starting and ending DATE
and a step INTERVAL
.
You can generate a set of DATE
values using GENERATE_DATE_ARRAY
. For example, this query returns the current DATE
and the following DATE
s at 1 WEEK
intervals up to and including a later DATE
:
SELECT GENERATE_DATE_ARRAY('2017-11-21', '2017-12-21', INTERVAL 1 WEEK) AS date_array;
+-------------------------------------------------------------+ | date_array | +-------------------------------------------------------------+ | [2017-11-21, 2017-11-28, 2017-12-05, 2017-12-12, 2017-12-19 | +-------------------------------------------------------------+
Accessing array elements
Consider the following table, sequences
:
+---------------------+ | some_numbers | +---------------------+ | [0, 1, 1, 2, 3, 5] | | [2, 4, 8, 16, 32] | | [5, 10] | +---------------------+
This table contains the column some_numbers
of the ARRAY
data type. To access elements from the arrays in this column, you must specify which type of indexing you want to use: either OFFSET
, for zero-based indexes, or ORDINAL
, for one-based indexes.
WITH sequences AS (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT [5, 10] AS some_numbers) SELECT some_numbers, some_numbers[OFFSET(1)] AS offset_1, some_numbers[ORDINAL(1)] AS ordinal_1 FROM sequences;
+--------------------+----------+-----------+ | some_numbers | offset_1 | ordinal_1 | +--------------------+----------+-----------+ | [0, 1, 1, 2, 3, 5] | 1 | 0 | | [2, 4, 8, 16, 32] | 4 | 2 | | [5, 10] | 10 | 5 | +--------------------+----------+-----------+
Note:OFFSET()
and ORDINAL()
will raise errors if the index is out of range. To avoid this, you can use SAFE_OFFSET()
or SAFE_ORDINAL()
to return NULL
instead of raising an error.
WITH sequences AS (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT [5, 10] AS some_numbers) SELECT some_numbers, some_numbers[SAFE_OFFSET(2)] AS offset_1, some_numbers[SAFE_ORDINAL(2)] AS ordinal_1 FROM sequences;
+--------------------+----------+-----------+ | some_numbers | offset_1 | ordinal_1 | +--------------------+----------+-----------+ | [0, 1, 1, 2, 3, 5] | 1 | 1 | | [2, 4, 8, 16, 32] | 8 | 4 | | [5, 10] | NULL | 10 | +--------------------+----------+-----------+
Finding lengths
The ARRAY_LENGTH()
function returns the length of an array.
WITH sequences AS (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT [5, 10] AS some_numbers) SELECT some_numbers, ARRAY_LENGTH(some_numbers) AS len FROM sequences;
+--------------------+--------+ | some_numbers | len | +--------------------+--------+ | [0, 1, 1, 2, 3, 5] | 6 | | [2, 4, 8, 16, 32] | 5 | | [5, 10] | 2 | +--------------------+--------+
Converting elements in an array to rows in a table
To convert an ARRAY
into a set of rows, also known as “flattening,” use the UNNEST
operator. UNNEST
takes an ARRAY
and returns a table with a single row for each element in the ARRAY
.
Because UNNEST
destroys the order of the ARRAY
elements, you may wish to restore order to the table. To do so, use the optional WITH OFFSET
clause to return an additional column with the offset for each array element, then use the ORDER BY
clause to order the rows by their offset.
SELECT * FROM UNNEST(['foo', 'bar', 'baz', 'qux', 'corge', 'garply', 'waldo', 'fred']) AS element WITH OFFSET AS offset ORDER BY offset;
+----------+--------+ | element | offset | +----------+--------+ | foo | 0 | | bar | 1 | | baz | 2 | | qux | 3 | | corge | 4 | | garply | 5 | | waldo | 6 | | fred | 7 | +----------+--------+
To flatten an entire column of ARRAY
s while preserving the values of the other columns in each row, use a correlated cross join to join the table containing the ARRAY
column to the UNNEST
output of that ARRAY
column.
With a correlated join, the UNNEST
operator references the ARRAY
typed column from each row in the source table, which appears previously in the FROM
clause. For each row N
in the source table, UNNEST
flattens the ARRAY
from row N
into a set of rows containing the ARRAY
elements, and then the cross join joins this new set of rows with the single row N
from the source table.
The following example uses UNNEST
to return a row for each element in the array column. Because of the CROSS JOIN
, the id
column contains the id
values for the row in sequences
that contains each number.
WITH sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers) SELECT id, flattened_numbers FROM sequences CROSS JOIN UNNEST(sequences.some_numbers) AS flattened_numbers;
+------+-------------------+ | id | flattened_numbers | +------+-------------------+ | 1 | 0 | | 1 | 1 | | 1 | 1 | | 1 | 2 | | 1 | 3 | | 1 | 5 | | 2 | 2 | | 2 | 4 | | 2 | 8 | | 2 | 16 | | 2 | 32 | | 3 | 5 | | 3 | 10 | +------+-------------------+
Note that for correlated cross joins the UNNEST
operator is optional and the CROSS JOIN
can be expressed as a comma-join. Using this shorthand notation, the above example becomes:
WITH sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers) SELECT id, flattened_numbers FROM sequences, sequences.some_numbers AS flattened_numbers;
or
WITH sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers) SELECT id, flattened_numbers FROM sequences, UNNEST(sequences.some_numbers) AS flattened_numbers;
Querying nested arrays
If a table contains an ARRAY
of STRUCT
s, you can flatten the ARRAY
to query the fields of the STRUCT
. You can also flatten ARRAY
type fields of STRUCT
values.
Querying STRUCT elements in an ARRAY
The following example uses UNNEST
with CROSS JOIN
to flatten an ARRAY
of STRUCT
s.
WITH races AS ( SELECT "800M" AS race, [STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as laps), STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as laps), STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as laps), STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as laps), STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as laps), STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as laps), STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as laps), STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as laps)] AS participants) SELECT race, participant FROM races r CROSS JOIN UNNEST(r.participants) as participant;
+------+---------------------------------------+ | race | participant | +------+---------------------------------------+ | 800M | {Rudisha, [23.4, 26.3, 26.4, 26.1]} | | 800M | {Makhloufi, [24.5, 25.4, 26.6, 26.1]} | | 800M | {Murphy, [23.9, 26, 27, 26]} | | 800M | {Bosse, [23.6, 26.2, 26.5, 27.1]} | | 800M | {Rotich, [24.7, 25.6, 26.9, 26.4]} | | 800M | {Lewandowski, [25, 25.7, 26.3, 27.2]} | | 800M | {Kipketer, [23.2, 26.1, 27.3, 29.4]} | | 800M | {Berian, [23.7, 26.1, 27, 29.3]} | +------+---------------------------------------+
You can find specific information from repeated fields. For example, the following query returns the fastest racer in an 800M race.
WITH races AS ( SELECT "800M" AS race, [STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as laps), STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as laps), STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as laps), STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as laps), STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as laps), STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as laps), STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as laps), STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as laps)] AS participants) SELECT race, (SELECT name FROM UNNEST(participants) ORDER BY ( SELECT SUM(duration) FROM UNNEST(laps) AS duration) ASC LIMIT 1) AS fastest_racer FROM races;
+------+---------------+ | race | fastest_racer | +------+---------------+ | 800M | Rudisha | +------+---------------+
Querying ARRAY-type fields in a STRUCT
You can also get information from nested repeated fields. For example, the following statement returns the runner who had the fastest lap in an 800M race.
+------+-------------------------+ | race | runner_with_fastest_lap | +------+-------------------------+ | 800M | Kipketer | +------+-------------------------+
Notice that the preceding query uses the comma operator (,
) to perform an implicit CROSS JOIN
. It is equivalent to the following example, which uses an explicit CROSS JOIN
.
WITH races AS ( SELECT "800M" AS race, [STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as laps), STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as laps), STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as laps), STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as laps), STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as laps), STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as laps), STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as laps), STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as laps)] AS participants) SELECT race, (SELECT name FROM UNNEST(participants) CROSS JOIN UNNEST(laps) AS duration ORDER BY duration ASC LIMIT 1) AS runner_with_fastest_lap FROM races;
Flattening arrays with a CROSS JOIN
excludes rows that have empty or NULL arrays. If you want to include these rows, use a LEFT JOIN
.
WITH races AS ( SELECT "800M" AS race, [STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as laps), STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as laps), STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as laps), STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as laps), STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as laps), STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as laps), STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as laps), STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as laps), STRUCT("Nathan" as name, ARRAY<FLOAT64>[] as laps), STRUCT("David" as name, NULL as laps)] AS participants) SELECT name, sum(duration) AS finish_time FROM races, races.participants LEFT JOIN participants.laps duration GROUP BY name;
+-------------+--------------------+ | name | finish_time | +-------------+--------------------+ | Murphy | 102.9 | | Rudisha | 102.19999999999999 | | David | NULL | | Rotich | 103.6 | | Makhloufi | 102.6 | | Berian | 106.1 | | Bosse | 103.4 | | Kipketer | 106 | | Nathan | NULL | | Lewandowski | 104.2 | +-------------+--------------------+
Creating arrays from subqueries
A common task when working with arrays is turning a subquery result into an array. In BigQuery, you can accomplish this using the ARRAY()
function.
For example, consider the following operation on the sequences
table:
WITH sequences AS (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT [5, 10] AS some_numbers) SELECT some_numbers, ARRAY(SELECT x * 2 FROM UNNEST(some_numbers) AS x) AS doubled FROM sequences;
+--------------------+---------------------+ | some_numbers | doubled | +--------------------+---------------------+ | [0, 1, 1, 2, 3, 5] | [0, 2, 2, 4, 6, 10] | | [2, 4, 8, 16, 32] | [4, 8, 16, 32, 64] | | [5, 10] | [10, 20] | +--------------------+---------------------+
This example starts with a table named sequences. This table contains a column, some_numbers
, of type ARRAY<INT64>
.
The query itself contains a subquery. This subquery selects each row in the some_numbers
column and uses UNNEST
to return the array as a set of rows. Next, it multiplies each value by two, and then recombines the rows back into an array using the ARRAY()
operator.
Filtering arrays
The following example uses a WHERE
clause in the ARRAY()
operator’s subquery to filter the returned rows.
Note: In the following examples, the resulting rows are not ordered.
WITH sequences AS (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT [5, 10] AS some_numbers) SELECT ARRAY(SELECT x * 2 FROM UNNEST(some_numbers) AS x WHERE x < 5) AS doubled_less_than_five FROM sequences;
+------------------------+ | doubled_less_than_five | +------------------------+ | [0, 2, 2, 4, 6] | | [4, 8] | | [] | +------------------------+
Notice that the third row contains an empty array, because the elements in the corresponding original row ([5, 10]
) did not meet the filter requirement of x < 5
.
You can also filter arrays by using SELECT DISTINCT
to return only unique elements within an array.
WITH sequences AS (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers) SELECT ARRAY(SELECT DISTINCT x FROM UNNEST(some_numbers) AS x) AS unique_numbers FROM sequences;
+-----------------+ | unique_numbers | +-----------------+ | [0, 1, 2, 3, 5] | +-----------------+
WITH sequences AS (SELECT [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT [5, 10] AS some_numbers) SELECT ARRAY(SELECT x FROM UNNEST(some_numbers) AS x WHERE 2 IN UNNEST(some_numbers)) AS contains_two FROM sequences;
+--------------------+ | contains_two | +--------------------+ | [0, 1, 1, 2, 3, 5] | | [2, 4, 8, 16, 32] | | [] | +--------------------+
Notice again that the third row contains an empty array, because the array in the corresponding original row ([5, 10]
) did not contain 2
.
Scanning arrays
To check if an array contains a specific value, use the IN
operator with UNNEST
. To check if an array contains a value matching a condition, use the EXISTS
operator with UNNEST
.
Scanning for specific values
To scan an array for a specific value, use the IN
operator with UNNEST
.
The following example returns true
if the array contains the number 2.
SELECT 2 IN UNNEST([0, 1, 1, 2, 3, 5]) AS contains_value;
+----------------+ | contains_value | +----------------+ | true | +----------------+
To return the rows of a table where the array column contains a specific value, filter the results of IN UNNEST
using the WHERE
clause.
The following example returns the id
value for the rows where the array column contains the value 2.
WITH sequences AS (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers) SELECT id AS matching_rows FROM sequences WHERE 2 IN UNNEST(sequences.some_numbers) ORDER BY matching_rows;
+---------------+ | matching_rows | +---------------+ | 1 | | 2 | +---------------+
Scanning for values that satisfy a condition
To scan an array for values that match a condition, use UNNEST
to return a table of the elements in the array, use WHERE
to filter the resulting table in a subquery, and use EXISTS
to check if the filtered table contains any rows.
The following example returns the id
value for the rows where the array column contains values greater than 5.
WITH Sequences AS ( SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers ) SELECT id AS matching_rows FROM Sequences WHERE EXISTS(SELECT * FROM UNNEST(some_numbers) AS x WHERE x > 5);
+---------------+ | matching_rows | +---------------+ | 2 | | 3 | +---------------+
Scanning for STRUCT field values that satisfy a condition
To search an array of STRUCT
s for a field whose value matches a condition, use UNNEST
to return a table with a column for each STRUCT
field, then filter non-matching rows from the table using WHERE EXISTS
.
The following example returns the rows where the array column contains a STRUCT
whose field b
has a value greater than 3.
WITH Sequences AS ( SELECT 1 AS id, [STRUCT(0 AS a, 1 AS b)] AS some_numbers UNION ALL SELECT 2 AS id, [STRUCT(2 AS a, 4 AS b)] AS some_numbers UNION ALL SELECT 3 AS id, [STRUCT(5 AS a, 3 AS b), STRUCT(7 AS a, 4 AS b)] AS some_numbers ) SELECT id AS matching_rows FROM Sequences WHERE EXISTS(SELECT 1 FROM UNNEST(some_numbers) WHERE b > 3);
+---------------+ | matching_rows | +---------------+ | 2 | | 3 | +---------------+