TimeSeries objects

class TimeSeries

A subclass of MaskedArray designed to manipulate time series.

Parameters:
data : {array_like}

Data portion of the array. Any data that is valid for constructing a MaskedArray can be used here:

  • a sequence of objects (numbers, characters, objects);
  • a ndarray or one of its subclass. In particular, MaskedArray and TimeSeries are recognized.
dates : {DateArray}

A DateArray instance storing the date information.

autosort : {True, False}, optional

Whether to sort the series in chronological order.

**optional_parameters :

All the parameters recognized by MaskedArray are also recognized by TimeSeries.

See also

MaskedArray

A TimeSeries object is the combination of three ndarrays:

These three arrays can be accessed as attributes of a TimeSeries object. Another very useful attribute is series, that gives the possibility to directly access data and mask as a masked array.

As TimeSeries objects subclass MaskedArray, they inherit all their attributes and methods, as well as the attributes and methods of regular ndarrays.

Attributes

... specific to TimeSeries

data

Returns a view of a TimeSeries as a ndarray. This attribute is read-only and cannot be directly set.

mask

Returns the mask of the object, as a ndarray with the same shape as data, or as the special value nomask (equivalent to False). This attribute is writable and can be modified.

If data has a standard dtype (no named fields), the dtype of the mask is boolean. If data is a structured array with named fields, the mask has the same structure as the data‘s, but each field is atomically boolean.

In any case, a value of True in the mask indicates that the corresponding value of the series is invalid.

series

Returns a view of a TimeSeries as a MaskedArray. This attribute is read-only and cannot be directly set

dates

Returns the DateArray object of the dates of the series. This attribute is writable and can be modified. However, the size of the array must be zero or match either the size of the series or its length.

varshape

Returns the number of equivalent variables for each date. If varshape == (), the series has only one variable and is called a 1V-series.

... inherited from MaskedArray

TimeSeries.fill_value Filling value.
TimeSeries.baseclass Class of the underlying data (read-only).
TimeSeries.recordmask Return the mask of the records.
TimeSeries.hardmask Hardness of the mask
TimeSeries.sharedmask Share status of the mask (read-only).

... inherited from ndarray

TimeSeries.base Base object if memory is from some other object.
TimeSeries.ctypes An object to simplify the interaction of the array with the ctypes module.
TimeSeries.dtype Data-type of the array’s elements.
TimeSeries.flags Information about the memory layout of the array.
TimeSeries.itemsize Length of one array element in bytes.
TimeSeries.nbytes Total bytes consumed by the elements of the array.
TimeSeries.ndim Number of array dimensions.
TimeSeries.shape Tuple of array dimensions.
TimeSeries.size Number of elements in the array.
TimeSeries.strides Tuple of bytes to step in each dimension when traversing an array.
TimeSeries.imag Imaginary part.
TimeSeries.real Real part
TimeSeries.flat Flat version of the array.

Construction

To construct a TimeSeries object, the simplest method is to directly call the class constructor with the proper parameters.

However, the recommended way is to use the time_series factory function.

time_series(data, dates=None, start_date=None, length=None, freq=None, mask=False, dtype=None, copy=False, fill_value=None, keep_mask=True, hard_mask=False, autosort=True)

Creates a TimeSeries object.

The data parameter can be a valid TimeSeries object. In that case, the dates, start_date or freq parameters are optional: if none of them is given, the dates of the result are the dates of data.

If data is not a TimeSeries, then dates must be either None or an object recognized by the date_array function (used internally):

  • an existing DateArray object;
  • a sequence of Date objects with the same frequency;
  • a sequence of datetime.datetime objects;
  • a sequence of dates in string format;
  • a sequence of integers corresponding to the representation of Date objects.

In any of the last four possibilities, the freq parameter is mandatory.

If dates is None, a continuous DateArray is automatically constructed as an array of size len(data) starting at start_date and with a frequency freq.

Parameters:

data : array_like

Data portion of the array. Any data that is valid for constructing a MaskedArray can be used here. data can also be a TimeSeries object.

dates : {None, var}, optional

A sequence of dates corresponding to each entry.

start_date : {Date}, optional

Date corresponding to the first entry of the data (index 0). This parameter must be a valid Date object, and is mandatory if dates is None and if data has a length greater or equal to 1.

length : {integer}, optional

Length of the dates.

freq : {freq_spec}, optional

A valid frequency specification, as a string or an integer. This parameter is mandatory if dates is None. Otherwise, the frequency of the series is set to the frequency of the dates input.

See also

numpy.ma.masked_array
Constructor for the MaskedArray class.
scikits.timeseries.date_array
Constructor for the DateArray class.

Notes

  • All other parameters recognized by the numpy.ma.array constructor are also recognized by the function.
  • If data is zero-sized, only the freq parameter is mandatory.

Note

By default, the series is automatically sorted in chronological order. This behavior can be overwritten by setting the keyword autosort=False.

Dates and data compatibility

The simplest example of a TimeSeries consists in a series series of one variable, where a date is associated with each element of the array. In that case, the dates attribute is a DateArray with the same size as the underlying array.

For example, we can create a 4-element series:

>>> first_date = ts.Date('D', '2009-01-01')
>>> series = ts.time_series([1, 2, 3, 4], start_date=first_date)
>>> series
timeseries([1 2 3 4],
   dates = [01-Jan-2009 ... 04-Jan-2009],
   freq  = D)

Note that with the use of the start_date keyword, the size of the dates attribute is automatically adjusted by time_series to match the size of the input data.

The dates can now be modified in place. For example, they can be shifted by one week with the following command.

>>> series.dates +=7
>>> series
timeseries([1 2 3 4],
   dates = [08-Jan-2009 ... 11-Jan-2009],
   freq  = D)

The dates can also be changed by setting the dates attribute to another DateArray object. In that case, the size of the new dates must match the size of the series, or a TimeSeriesCompatibilityError is raised. Setting the dates attribute to an object of a different type raises a TypeError exception.

It is often convenient to manipulate a series of several variables at once. Once possibility is to use a structured array as input, as illustrated by the following example:

>>>  series = ts.time_series(zip(np.random.normal(0, 1, 10),
...                              np.random.uniform(0, 1, 10)),
...                          dtype=[('norm', float), ('unif', float)],
...                          start_date=ts.Date('D', '2001-01-01'))

In this example, series consists of two fields (‘norm’ and ‘unif’). Note that in this example, the two fields have the same type (float), but this is not a requirement. Each field can be accessed as an independent TimeSeries using series['norm'] and series['unif'].

In practice, each individual entry of series is a numpy.void object. The series as a whole behaves as a 1D masked array, as represented by the shape of the series: series.shape = (10,). Because series is a 1D array, the size of series.dates must match series.size.

Despite the convenience of this approach to manipulate multi-variable series, it presents a serious disadvantage: structured arrays are usually not recognized by standard numpy functions.

An alternative is then to represent a series as a two-dimensional array, using columns as variables and rows as actual obervations. In that case, all the variables must have the same type, and the size of the dates attibute must match the length of the series.

More generally, it is possible to create a multi-variable series as a nD array. The corresponding dates must then satisfy the condition series.dates.size == series.shape[0] or a TimeSeriesCompatibilityError is raised. The specific attribute varshape is then set to keep track of the number of variables.

For example, a series of 50 years of monthly data can be represented as a (600,)-array of observations at a monthly frequency, or as a (50,12)-array of observations at an annual frequency.

>>> start - ts.Date('M', '2001-01')
>>> data = np.random.uniform(-1, +1, 50*12).reshape(50, 12)
>>> mseries = ts.time_series(data, start_date=start, length=50*12)
>>> aseries = ts.time_series(data, start_date=start.asfreq('Y'), length=50)

Both series have the same shape, (50, 12), but mseries is a series of one variable, with mseries.varshape == (), while aseries is a series of 12 variables, aseries.varshape == (12,), each variable corresponding to a month.

>>> (mseries.shape, mseries.varshape)
((50, 12), ())
>>> (aseries.shape, aseries.varshape)
((50, 12), (12,))

Because aseries is basically a 2D array, we can easily compute annual and monthly means. Thus, monthly means over the whole 50 years can be calculated at once with the mean method, using axis=0 as parameter. We can also compute the equivalent of 50 years of annual data using mean method, this time with axis=1.

>>> amean = aseries.mean(axis=1)
>>> amean.shape = (50,)
>>> mmean = aseries.mean(axis=0)
>>> mmean.shape = (12,)

Another example of multi-variable series would be one year of daily (256x256) raster map. This dataset can easily be represented as a (365,256,256)-array, and a corresponding series created with the following code:

>>> data = np.random.uniform(-1, +1, 365*256*256).reshape(365, 256, 256)
>>> newseries = ts.time_series(data, start_date=ts.now('D'))

Methods

Date information

The following methods access information about the dates attribute:

TimeSeries.get_steps()
Returns the time steps between consecutive dates, in the same unit as the instance frequency.
TimeSeries.has_missing_dates()
Returns whether the instance has missing dates.
TimeSeries.has_duplicated_dates()
Returns whether the instance has duplicated dates.
TimeSeries.is_full()
Returns whether the instance has no missing dates.
TimeSeries.is_valid()
Returns whether the instance is valid (that there are no missing nor duplicated dates).
TimeSeries.is_chronological()
Returns whether the instance is sorted in chronological order.
TimeSeries.date_to_index(date) Returns the index corresponding to a given date, as an integer.
TimeSeries.sort_chronologically() Sort the series by chronological order (in place).

Dates manipulation

TimeSeries.adjust_endpoints(a[, start_date, ...]) Returns a TimeSeries going from start_date to end_date.
TimeSeries.compressed(series) Suppresses missing values from a time series.
TimeSeries.fill_missing_dates(data[, dates, ...]) Finds and fills the missing dates in a time series.

Shape manipulation

For reshape, resize, and transpose, the single tuple argument may be replaced with n integers which will be interpreted as an n-tuple.

TimeSeries.flatten(series) Flattens a (multi-) time series to 1D series.
TimeSeries.ravel() Returns a ravelled view of the instance.
TimeSeries.reshape(*newshape, **kwargs) Returns a time series containing the data of a, but with a new shape.
TimeSeries.resize(newshape[, refcheck, order])
TimeSeries.split() Split a multi-dimensional series into individual columns.
TimeSeries.squeeze([axis]) Remove single-dimensional entries from the shape of a.
TimeSeries.swapaxes(axis1, axis2) Return a view of the array with axis1 and axis2 interchanged.
TimeSeries.transpose(*axes) Permute the dimensions of an array.
TimeSeries.T

Item selection and manipulation

TimeSeries.argmax([axis, fill_value, out]) Returns array of indices of the maximum values along the given axis.
TimeSeries.argmin([axis, fill_value, out]) Return array of indices to the minimum values along the given axis.
TimeSeries.argsort([axis, kind, order, ...]) Return an ndarray of indices that sort the array along the specified axis.
TimeSeries.choose(choices[, out, mode]) Use an index array to construct a new array from a set of choices.
TimeSeries.compress Return a where condition is True.
TimeSeries.diagonal([offset, axis1, axis2]) Return specified diagonals.
TimeSeries.fill(value) Fill the array with a scalar value.
TimeSeries.filled([fill_value]) Returns an array of the same class as _data, with masked values filled with fill_value.
TimeSeries.item(*args) Copy an element of an array to a standard Python scalar and return it.
TimeSeries.nonzero() Return the indices of unmasked elements that are not zero.
TimeSeries.put(indices, values[, mode]) Set storage-indexed locations to corresponding values.
TimeSeries.repeat(repeats[, axis]) Repeat elements of an array.
TimeSeries.searchsorted(v[, side, sorter]) Find indices where elements of v should be inserted in a to maintain order.
TimeSeries.sort([axis, kind, order, ...]) Sort the array, in-place
TimeSeries.take(indices[, axis, out, mode])
TimeSeries.tshift(series, nper[, copy]) Returns a series of the same size as series, with the same start_date and end_date, but values shifted by nper.

Pickling and copy

TimeSeries.copy([order]) Return a copy of the array.
TimeSeries.dump(file) Dump a pickle of the array to the specified file.
TimeSeries.dumps() Returns the pickle of the array as a string.

Calculations

TimeSeries.all Check if all of the elements of a are true.
TimeSeries.anom Compute the anomalies (deviations from the arithmetic mean) along the given axis.
TimeSeries.any Check if any of the elements of a are true.
TimeSeries.clip([min, max, out]) Return an array whose values are limited to [min, max].
TimeSeries.conj() Complex-conjugate all elements.
TimeSeries.conjugate() Return the complex conjugate, element-wise.
TimeSeries.cumprod Return the cumulative product of the elements along the given axis.
TimeSeries.cumsum Return the cumulative sum of the elements along the given axis.
TimeSeries.max([axis, fill_value]) Return the maximum of self along the given axis.
TimeSeries.mean Returns the average of the array elements.
TimeSeries.min([axis, fill_value]) Return the minimum of self along the given axis.
TimeSeries.pct(series[, nper]) Returns the rolling percentage change of the series.
TimeSeries.pct_log(series[, nper]) Returns the rolling log percentage change of the series.
TimeSeries.pct_symmetric(series[, nper]) Returns the rolling symmetric percentage change of the series.
TimeSeries.prod Return the product of the array elements over the given axis.
TimeSeries.product([axis, dtype, out]) Return the product of the array elements over the given axis.
TimeSeries.ptp([axis, out, fill_value]) Return (maximum - minimum) along the the given dimension (i.e.
TimeSeries.round([decimals, out]) Return a with each element rounded to the given number of decimals.
TimeSeries.std Compute the standard deviation along the specified axis.
TimeSeries.sum Return the sum of the array elements over the given axis.
TimeSeries.trace([offset, axis1, axis2, ...]) Return the sum along diagonals of the array.
TimeSeries.var Compute the variance along the specified axis.