TimeSeries objects¶
- class TimeSeries¶
A subclass of MaskedArray designed to manipulate time series.
Parameters: - data : {array_like}
Data portion of the array. Any data that is valid for constructing a MaskedArray can be used here:
- a sequence of objects (numbers, characters, objects);
- a ndarray or one of its subclass. In particular, MaskedArray and TimeSeries are recognized.
- dates : {DateArray}
A DateArray instance storing the date information.
- autosort : {True, False}, optional
Whether to sort the series in chronological order.
- **optional_parameters :
All the parameters recognized by MaskedArray are also recognized by TimeSeries.
See also
MaskedArray
A TimeSeries object is the combination of three ndarrays:
- dates: A DateArray object.
- data : A ndarray.
- mask : A boolean ndarray, indicating missing or invalid data.
These three arrays can be accessed as attributes of a TimeSeries object. Another very useful attribute is series, that gives the possibility to directly access data and mask as a masked array.
As TimeSeries objects subclass MaskedArray, they inherit all their attributes and methods, as well as the attributes and methods of regular ndarrays.
Attributes¶
... specific to TimeSeries¶
- data¶
Returns a view of a TimeSeries as a ndarray. This attribute is read-only and cannot be directly set.
- mask¶
Returns the mask of the object, as a ndarray with the same shape as data, or as the special value nomask (equivalent to False). This attribute is writable and can be modified.
If data has a standard dtype (no named fields), the dtype of the mask is boolean. If data is a structured array with named fields, the mask has the same structure as the data‘s, but each field is atomically boolean.
In any case, a value of True in the mask indicates that the corresponding value of the series is invalid.
- series¶
Returns a view of a TimeSeries as a MaskedArray. This attribute is read-only and cannot be directly set
- dates¶
Returns the DateArray object of the dates of the series. This attribute is writable and can be modified. However, the size of the array must be zero or match either the size of the series or its length.
- varshape¶
Returns the number of equivalent variables for each date. If varshape == (), the series has only one variable and is called a 1V-series.
... direct access to the dates information¶
| freq | freqstr |
| year | years |
| qyear | |
| quarter | quarters |
| month | months |
| week | weeks |
| day | days |
| day_of_week | weekdays |
| day_of_year | yeardays |
| hour | hours |
| minute | minutes |
| second | seconds |
| start_date | end_date |
... inherited from MaskedArray¶
| TimeSeries.fill_value | Filling value. |
| TimeSeries.baseclass | Class of the underlying data (read-only). |
| TimeSeries.recordmask | Return the mask of the records. |
| TimeSeries.hardmask | Hardness of the mask |
| TimeSeries.sharedmask | Share status of the mask (read-only). |
... inherited from ndarray¶
| TimeSeries.base | Base object if memory is from some other object. |
| TimeSeries.ctypes | An object to simplify the interaction of the array with the ctypes module. |
| TimeSeries.dtype | Data-type of the array’s elements. |
| TimeSeries.flags | Information about the memory layout of the array. |
| TimeSeries.itemsize | Length of one array element in bytes. |
| TimeSeries.nbytes | Total bytes consumed by the elements of the array. |
| TimeSeries.ndim | Number of array dimensions. |
| TimeSeries.shape | Tuple of array dimensions. |
| TimeSeries.size | Number of elements in the array. |
| TimeSeries.strides | Tuple of bytes to step in each dimension when traversing an array. |
| TimeSeries.imag | Imaginary part. |
| TimeSeries.real | Real part |
| TimeSeries.flat | Flat version of the array. |
Construction¶
To construct a TimeSeries object, the simplest method is to directly call the class constructor with the proper parameters.
However, the recommended way is to use the time_series factory function.
- time_series(data, dates=None, start_date=None, length=None, freq=None, mask=False, dtype=None, copy=False, fill_value=None, keep_mask=True, hard_mask=False, autosort=True)¶
Creates a TimeSeries object.
The data parameter can be a valid TimeSeries object. In that case, the dates, start_date or freq parameters are optional: if none of them is given, the dates of the result are the dates of data.
If data is not a TimeSeries, then dates must be either None or an object recognized by the date_array function (used internally):
- an existing DateArray object;
- a sequence of Date objects with the same frequency;
- a sequence of datetime.datetime objects;
- a sequence of dates in string format;
- a sequence of integers corresponding to the representation of Date objects.
In any of the last four possibilities, the freq parameter is mandatory.
If dates is None, a continuous DateArray is automatically constructed as an array of size len(data) starting at start_date and with a frequency freq.
Parameters: data : array_like
Data portion of the array. Any data that is valid for constructing a MaskedArray can be used here. data can also be a TimeSeries object.
dates : {None, var}, optional
A sequence of dates corresponding to each entry.
start_date : {Date}, optional
Date corresponding to the first entry of the data (index 0). This parameter must be a valid Date object, and is mandatory if dates is None and if data has a length greater or equal to 1.
length : {integer}, optional
Length of the dates.
freq : {freq_spec}, optional
A valid frequency specification, as a string or an integer. This parameter is mandatory if dates is None. Otherwise, the frequency of the series is set to the frequency of the dates input.
See also
- numpy.ma.masked_array
- Constructor for the MaskedArray class.
- scikits.timeseries.date_array
- Constructor for the DateArray class.
Notes
- All other parameters recognized by the numpy.ma.array constructor are also recognized by the function.
- If data is zero-sized, only the freq parameter is mandatory.
Note
By default, the series is automatically sorted in chronological order. This behavior can be overwritten by setting the keyword autosort=False.
Dates and data compatibility¶
The simplest example of a TimeSeries consists in a series series of one variable, where a date is associated with each element of the array. In that case, the dates attribute is a DateArray with the same size as the underlying array.
For example, we can create a 4-element series:
>>> first_date = ts.Date('D', '2009-01-01')
>>> series = ts.time_series([1, 2, 3, 4], start_date=first_date)
>>> series
timeseries([1 2 3 4],
dates = [01-Jan-2009 ... 04-Jan-2009],
freq = D)
Note that with the use of the start_date keyword, the size of the dates attribute is automatically adjusted by time_series to match the size of the input data.
The dates can now be modified in place. For example, they can be shifted by one week with the following command.
>>> series.dates +=7
>>> series
timeseries([1 2 3 4],
dates = [08-Jan-2009 ... 11-Jan-2009],
freq = D)
The dates can also be changed by setting the dates attribute to another DateArray object. In that case, the size of the new dates must match the size of the series, or a TimeSeriesCompatibilityError is raised. Setting the dates attribute to an object of a different type raises a TypeError exception.
It is often convenient to manipulate a series of several variables at once. Once possibility is to use a structured array as input, as illustrated by the following example:
>>> series = ts.time_series(zip(np.random.normal(0, 1, 10),
... np.random.uniform(0, 1, 10)),
... dtype=[('norm', float), ('unif', float)],
... start_date=ts.Date('D', '2001-01-01'))
In this example, series consists of two fields (‘norm’ and ‘unif’). Note that in this example, the two fields have the same type (float), but this is not a requirement. Each field can be accessed as an independent TimeSeries using series['norm'] and series['unif'].
In practice, each individual entry of series is a numpy.void object. The series as a whole behaves as a 1D masked array, as represented by the shape of the series: series.shape = (10,). Because series is a 1D array, the size of series.dates must match series.size.
Despite the convenience of this approach to manipulate multi-variable series, it presents a serious disadvantage: structured arrays are usually not recognized by standard numpy functions.
An alternative is then to represent a series as a two-dimensional array, using columns as variables and rows as actual obervations. In that case, all the variables must have the same type, and the size of the dates attibute must match the length of the series.
More generally, it is possible to create a multi-variable series as a nD array. The corresponding dates must then satisfy the condition series.dates.size == series.shape[0] or a TimeSeriesCompatibilityError is raised. The specific attribute varshape is then set to keep track of the number of variables.
For example, a series of 50 years of monthly data can be represented as a (600,)-array of observations at a monthly frequency, or as a (50,12)-array of observations at an annual frequency.
>>> start - ts.Date('M', '2001-01')
>>> data = np.random.uniform(-1, +1, 50*12).reshape(50, 12)
>>> mseries = ts.time_series(data, start_date=start, length=50*12)
>>> aseries = ts.time_series(data, start_date=start.asfreq('Y'), length=50)
Both series have the same shape, (50, 12), but mseries is a series of one variable, with mseries.varshape == (), while aseries is a series of 12 variables, aseries.varshape == (12,), each variable corresponding to a month.
>>> (mseries.shape, mseries.varshape)
((50, 12), ())
>>> (aseries.shape, aseries.varshape)
((50, 12), (12,))
Because aseries is basically a 2D array, we can easily compute annual and monthly means. Thus, monthly means over the whole 50 years can be calculated at once with the mean method, using axis=0 as parameter. We can also compute the equivalent of 50 years of annual data using mean method, this time with axis=1.
>>> amean = aseries.mean(axis=1)
>>> amean.shape = (50,)
>>> mmean = aseries.mean(axis=0)
>>> mmean.shape = (12,)
Another example of multi-variable series would be one year of daily (256x256) raster map. This dataset can easily be represented as a (365,256,256)-array, and a corresponding series created with the following code:
>>> data = np.random.uniform(-1, +1, 365*256*256).reshape(365, 256, 256)
>>> newseries = ts.time_series(data, start_date=ts.now('D'))
Methods¶
Date information¶
The following methods access information about the dates attribute:
|
Returns the time steps between consecutive dates, in the same unit as the instance frequency. |
|
Returns whether the instance has missing dates. |
|
Returns whether the instance has duplicated dates. |
|
Returns whether the instance has no missing dates. |
|
Returns whether the instance is valid (that there are no missing nor duplicated dates). |
|
Returns whether the instance is sorted in chronological order. |
| TimeSeries.date_to_index(date) | Returns the index corresponding to a given date, as an integer. |
| TimeSeries.sort_chronologically() | Sort the series by chronological order (in place). |
Dates manipulation¶
| TimeSeries.adjust_endpoints(a[, start_date, ...]) | Returns a TimeSeries going from start_date to end_date. |
| TimeSeries.compressed(series) | Suppresses missing values from a time series. |
| TimeSeries.fill_missing_dates(data[, dates, ...]) | Finds and fills the missing dates in a time series. |
Shape manipulation¶
For reshape, resize, and transpose, the single tuple argument may be replaced with n integers which will be interpreted as an n-tuple.
| TimeSeries.flatten(series) | Flattens a (multi-) time series to 1D series. |
| TimeSeries.ravel() | Returns a ravelled view of the instance. |
| TimeSeries.reshape(*newshape, **kwargs) | Returns a time series containing the data of a, but with a new shape. |
| TimeSeries.resize(newshape[, refcheck, order]) | |
| TimeSeries.split() | Split a multi-dimensional series into individual columns. |
| TimeSeries.squeeze([axis]) | Remove single-dimensional entries from the shape of a. |
| TimeSeries.swapaxes(axis1, axis2) | Return a view of the array with axis1 and axis2 interchanged. |
| TimeSeries.transpose(*axes) | Permute the dimensions of an array. |
| TimeSeries.T |
Item selection and manipulation¶
| TimeSeries.argmax([axis, fill_value, out]) | Returns array of indices of the maximum values along the given axis. |
| TimeSeries.argmin([axis, fill_value, out]) | Return array of indices to the minimum values along the given axis. |
| TimeSeries.argsort([axis, kind, order, ...]) | Return an ndarray of indices that sort the array along the specified axis. |
| TimeSeries.choose(choices[, out, mode]) | Use an index array to construct a new array from a set of choices. |
| TimeSeries.compress | Return a where condition is True. |
| TimeSeries.diagonal([offset, axis1, axis2]) | Return specified diagonals. |
| TimeSeries.fill(value) | Fill the array with a scalar value. |
| TimeSeries.filled([fill_value]) | Returns an array of the same class as _data, with masked values filled with fill_value. |
| TimeSeries.item(*args) | Copy an element of an array to a standard Python scalar and return it. |
| TimeSeries.nonzero() | Return the indices of unmasked elements that are not zero. |
| TimeSeries.put(indices, values[, mode]) | Set storage-indexed locations to corresponding values. |
| TimeSeries.repeat(repeats[, axis]) | Repeat elements of an array. |
| TimeSeries.searchsorted(v[, side, sorter]) | Find indices where elements of v should be inserted in a to maintain order. |
| TimeSeries.sort([axis, kind, order, ...]) | Sort the array, in-place |
| TimeSeries.take(indices[, axis, out, mode]) | |
| TimeSeries.tshift(series, nper[, copy]) | Returns a series of the same size as series, with the same start_date and end_date, but values shifted by nper. |
Pickling and copy¶
| TimeSeries.copy([order]) | Return a copy of the array. |
| TimeSeries.dump(file) | Dump a pickle of the array to the specified file. |
| TimeSeries.dumps() | Returns the pickle of the array as a string. |
Calculations¶
| TimeSeries.all | Check if all of the elements of a are true. |
| TimeSeries.anom | Compute the anomalies (deviations from the arithmetic mean) along the given axis. |
| TimeSeries.any | Check if any of the elements of a are true. |
| TimeSeries.clip([min, max, out]) | Return an array whose values are limited to [min, max]. |
| TimeSeries.conj() | Complex-conjugate all elements. |
| TimeSeries.conjugate() | Return the complex conjugate, element-wise. |
| TimeSeries.cumprod | Return the cumulative product of the elements along the given axis. |
| TimeSeries.cumsum | Return the cumulative sum of the elements along the given axis. |
| TimeSeries.max([axis, fill_value]) | Return the maximum of self along the given axis. |
| TimeSeries.mean | Returns the average of the array elements. |
| TimeSeries.min([axis, fill_value]) | Return the minimum of self along the given axis. |
| TimeSeries.pct(series[, nper]) | Returns the rolling percentage change of the series. |
| TimeSeries.pct_log(series[, nper]) | Returns the rolling log percentage change of the series. |
| TimeSeries.pct_symmetric(series[, nper]) | Returns the rolling symmetric percentage change of the series. |
| TimeSeries.prod | Return the product of the array elements over the given axis. |
| TimeSeries.product([axis, dtype, out]) | Return the product of the array elements over the given axis. |
| TimeSeries.ptp([axis, out, fill_value]) | Return (maximum - minimum) along the the given dimension (i.e. |
| TimeSeries.round([decimals, out]) | Return a with each element rounded to the given number of decimals. |
| TimeSeries.std | Compute the standard deviation along the specified axis. |
| TimeSeries.sum | Return the sum of the array elements over the given axis. |
| TimeSeries.trace([offset, axis1, axis2, ...]) | Return the sum along diagonals of the array. |
| TimeSeries.var | Compute the variance along the specified axis. |
