mirror of
https://github.com/rasterio/rasterio.git
synced 2025-12-08 17:36:12 +00:00
* Remove rasterio.vfs from doc * Fix broken links in doc * Fix typos in doc * Update CLI doc * Use rst links in doc
373 lines
13 KiB
ReStructuredText
373 lines
13 KiB
ReStructuredText
Switching from GDAL's Python bindings
|
||
=====================================
|
||
|
||
This document is written specifically for users of GDAL's Python bindings
|
||
(``osgeo.gdal``) who have read about Rasterio's :doc:`philosophy <../intro>` and
|
||
want to know what switching entails. The good news is that switching may not
|
||
be complicated. This document explains the key similarities and differences
|
||
between these two Python packages and highlights the features of Rasterio that
|
||
can help in switching.
|
||
|
||
Mutual Incompatibilities
|
||
------------------------
|
||
|
||
Rasterio and GDAL's bindings can contend for global GDAL objects. Unless you
|
||
have deep knowledge about both packages, choose exactly one of ``import
|
||
osgeo.gdal`` or ``import rasterio``.
|
||
|
||
GDAL's bindings (``gdal`` for the rest of this document) and Rasterio are not
|
||
entirely compatible and should not, without a great deal of care, be imported
|
||
and used in a single Python program. The reason is that the dynamic library
|
||
they each load (these are C extension modules, remember), ``libgdal.so`` on
|
||
Linux, ``gdal.dll`` on Windows, has a number of global objects and the two
|
||
modules take different approaches to managing these objects.
|
||
|
||
Static linking of the GDAL library for ``gdal`` and ``rasterio`` can avoid
|
||
this contention, but in practice you will almost never see distributions of
|
||
these modules that statically link the GDAL library.
|
||
|
||
Beyond the issues above, the modules have different styles – ``gdal`` reads and
|
||
writes like C while ``rasterio`` is more Pythonic – and don't complement each
|
||
other well.
|
||
|
||
The GDAL Environment
|
||
--------------------
|
||
|
||
GDAL library functions are excuted in a context of format drivers, error
|
||
handlers, and format-specific configuration options that this document will
|
||
call the "GDAL Environment." Rasterio has an abstraction for the GDAL
|
||
environment, ``gdal`` does not.
|
||
|
||
With ``gdal``, this context is initialized upon import of the module. This
|
||
makes sense because ``gdal`` objects are thin wrappers around functions and
|
||
classes in the GDAL dynamic library that generally require registration of
|
||
drivers and error handlers. The ``gdal`` module doesn't have an abstraction
|
||
for the environment, but it can be modified using functions like
|
||
``gdal.SetErrorHandler()`` and ``gdal.UseExceptions()``.
|
||
|
||
Rasterio has modules that don't require complete initialization and
|
||
configuration of GDAL (``rasterio.dtypes``, ``rasterio.profiles``, and
|
||
``rasterio.windows``, for example) and in the interest of reducing overhead
|
||
doesn't register format drivers and error handlers until they are needed. The
|
||
functions that do need fully initialized GDAL environments will ensure that
|
||
they exist. ``rasterio.open()`` is the foremost of this category of functions.
|
||
Consider the example code below.
|
||
|
||
.. code-block:: python
|
||
|
||
import rasterio
|
||
# The GDAL environment has no registered format drivers or error
|
||
# handlers at this point.
|
||
|
||
with rasterio.open('example.tif') as src:
|
||
# Format drivers and error handlers are registered just before
|
||
# open() executes.
|
||
|
||
Importing ``rasterio`` does not initialize the GDAL environment. Calling
|
||
``rasterio.open()`` does. This is different from ``gdal`` where ``import
|
||
osgeo.gdal``, not ``osgeo.gdal.Open()``, initializes the GDAL environment.
|
||
|
||
Rasterio has an abstraction for the GDAL environment, ``rasterio.Env``, that
|
||
can be invoked explicitly for more control over the configuration of GDAL as
|
||
shown below.
|
||
|
||
.. code-block:: python
|
||
|
||
import rasterio
|
||
# The GDAL environment has no registered format drivers or error
|
||
# handlers at this point.
|
||
|
||
with rasterio.Env(CPL_DEBUG=True, GDAL_CACHEMAX=512):
|
||
# This ensures that all drivers are registered in the global
|
||
# context. Within this block *only* GDAL's debugging messages
|
||
# are turned on and the raster block cache size is set to 512MB.
|
||
|
||
with rasterio.open('example.tif') as src:
|
||
# Perform GDAL operations in this context.
|
||
# ...
|
||
# Done.
|
||
|
||
# At this point, configuration options are set back to their
|
||
# previous (possibly unset) values. The raster block cache size
|
||
# is returned to its default (5% of available RAM) and debugging
|
||
# messages are disabled.
|
||
|
||
As mentioned previously, ``gdal`` has no such abstraction for the GDAL
|
||
environment. The nearest approximation would be something like the code
|
||
below.
|
||
|
||
.. code-block:: python
|
||
|
||
from osgeo import gdal
|
||
|
||
# Define a new configuration, save the previous configuration,
|
||
# and then apply the new one.
|
||
new_config = {
|
||
'CPL_DEBUG': 'ON', 'GDAL_CACHEMAX': '512'}
|
||
prev_config = {
|
||
key: gdal.GetConfigOption(key) for key in new_config.keys()}
|
||
for key, val in new_config.items():
|
||
gdal.SetConfigOption(key, val)
|
||
|
||
# Perform GDAL operations in this context.
|
||
# ...
|
||
# Done.
|
||
|
||
# Restore previous configuration.
|
||
for key, val in prev_config.items():
|
||
gdal.SetConfigOption(key, val)
|
||
|
||
Rasterio achieves this with a single Python statement.
|
||
|
||
.. code-block:: python
|
||
|
||
with rasterio.Env(CPL_DEBUG=True, GDAL_CACHEMAX=512):
|
||
# ...
|
||
|
||
|
||
Format Drivers
|
||
--------------
|
||
|
||
``gdal`` provides objects for each of the GDAL format drivers. With Rasterio,
|
||
format drivers are represented by strings and are used only as arguments to
|
||
functions like ``rasterio.open()``.
|
||
|
||
.. code-block:: python
|
||
|
||
dst = rasterio.open('new.tif', 'w', format='GTiff', **kwargs)
|
||
|
||
Rasterio uses the same format driver names as GDAL does.
|
||
|
||
Dataset Identifiers
|
||
-------------------
|
||
|
||
Rasterio uses URIs to identify datasets, with schemes for different protocols.
|
||
The GDAL bindings have their own special syntax.
|
||
|
||
Unix-style filenames such as ``/var/data/example.tif`` identify dataset files
|
||
for both Rasterio and ``gdal``. Rasterio also accepts 'file' scheme URIs
|
||
like ``file:///var/data/example.tif``.
|
||
|
||
Rasterio identifies datasets within ZIP or tar archives using Apache VFS style
|
||
identifiers like ``zip:///var/data/example.zip!example.tif`` or
|
||
``tar:///var/data/example.tar!example.tif``.
|
||
|
||
Datasets served via HTTPS are identified using 'https' URIs like
|
||
``https://landsat-pds.s3.amazonaws.com/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF``.
|
||
|
||
Datasets on AWS S3 are identified using 's3' scheme identifiers like
|
||
``s3://landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF``.
|
||
|
||
With ``gdal``, the equivalent identifiers are respectively
|
||
``/vsizip//var/data/example.zip/example.tif``,
|
||
``/vsitar//var/data/example.tar/example.tif``,
|
||
``/vsicurl/landsat-pds.s3.amazonaws.com/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF``,
|
||
and
|
||
``/vsis3/landsat-pds/L8/139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF``.
|
||
|
||
To help developers switch, Rasterio will accept these identifiers and other
|
||
format-specific connection strings, too, and dispatch them to the proper format
|
||
drivers and protocols.
|
||
|
||
Dataset Objects
|
||
---------------
|
||
|
||
Rasterio and ``gdal`` each have dataset objects. Not the same classes, of
|
||
course, but not radically different ones. In each case, you generally get
|
||
dataset objects through an "opener" function: ``rasterio.open()`` or
|
||
``gdal.Open()``.
|
||
|
||
So that Python developers can spend less time reading docs, the dataset object
|
||
returned by ``rasterio.open()`` is modeled on Python's file object. It even has
|
||
the ``close()`` method that ``gdal`` lacks so that you can actively close
|
||
dataset connections.
|
||
|
||
Bands
|
||
-----
|
||
|
||
``gdal`` has band objects. Rasterio does not and thus never has objects with
|
||
dangling dataset pointers. With Rasterio, bands are represented by a numerical
|
||
index, starting from 1 (as GDAL does), and are used as arguments to dataset
|
||
methods. To read the first band of a dataset as a Numpy ``ndarray``, do this.
|
||
|
||
.. code-block:: python
|
||
|
||
with rasterio.open('example.tif') as src:
|
||
band1 = src.read(1)
|
||
|
||
Other attributes of GDAL band objects generally surface in Rasterio as tuples
|
||
returned by dataset attributes, with one value per band, in order.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> src = rasterio.open('example.tif')
|
||
>>> src.indexes
|
||
(1, 2, 3)
|
||
>>> src.dtypes
|
||
('uint8', 'uint8', 'uint8')
|
||
>>> src.descriptions
|
||
('Red band', 'Green band', 'Blue band')
|
||
>>> src.units
|
||
('DN', 'DN', 'DN')
|
||
|
||
Developers that want read-only band objects for their applications can create
|
||
them by zipping these tuples together.
|
||
|
||
.. code-block:: python
|
||
|
||
from collections import namedtuple
|
||
|
||
Band = namedtuple('Band', ['idx', 'dtype', 'description', 'units'])
|
||
|
||
src = rasterio.open('example.tif')
|
||
bands = [Band(vals) for vals in zip(
|
||
src.indexes, src.dtypes, src.descriptions, src.units)]
|
||
|
||
Namedtuples are like lightweight classes.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> for band in bands:
|
||
... print(band.idx)
|
||
...
|
||
1
|
||
2
|
||
3
|
||
|
||
Geotransforms
|
||
-------------
|
||
|
||
The ``transform`` attribute of a Rasterio dataset object is comparable to the
|
||
``GeoTransform`` attribute of a GDAL dataset, but Rasterio's has more power.
|
||
It's not just an array of affine transformation matrix elements, it's an
|
||
instance of an ``Afine`` class and has many handy methods. For example, the
|
||
spatial coordinates of the upper left corner of any raster element is the
|
||
product of the dataset's ``transform`` matrix and the ``(column, row)`` index
|
||
of the element.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> src = rasterio.open('example.tif')
|
||
>>> src.transform * (0, 0)
|
||
(101985.0, 2826915.0)
|
||
|
||
The affine transformation matrix can be inverted as well.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> ~src.transform * (101985.0, 2826915.0)
|
||
(0.0, 0.0)
|
||
|
||
To help developers switch, ``Affine`` instances can be created from or
|
||
converted to the sequences used by ``gdal``.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> from rasterio.transform import Affine
|
||
>>> Affine.from_gdal(101985.0, 300.0379266750948, 0.0,
|
||
... 2826915.0, 0.0, -300.041782729805).to_gdal()
|
||
...
|
||
(101985.0, 300.0379266750948, 0.0, 2826915.0, 0.0, -300.041782729805)
|
||
|
||
Coordinate Reference Systems
|
||
----------------------------
|
||
|
||
The ``crs`` attribute of a Rasterio dataset object is an instance of Rasterio's
|
||
``CRS`` class and works well with ``pyproj``.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> from pyproj import Proj, transform
|
||
>>> src = rasterio.open('example.tif')
|
||
>>> transform(Proj(src.crs), Proj('+init=epsg:3857'), 101985.0, 2826915.0)
|
||
(-8789636.707871985, 2938035.238323653)
|
||
|
||
Tags
|
||
----
|
||
|
||
GDAL metadata items are called "tags" in Rasterio. The tag set for a given GDAL
|
||
metadata namespace is represented as a dict.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> src.tags()
|
||
{'AREA_OR_POINT': 'Area'}
|
||
>>> src.tags(ns='IMAGE_STRUCTURE')
|
||
{'INTERLEAVE': 'PIXEL'}
|
||
|
||
The semantics of the tags in GDAL's default and ``IMAGE_STRUCTURE`` namespaces
|
||
are described in http://www.gdal.org/gdal_datamodel.html. Rasterio uses
|
||
several namespaces of its own: ``rio_creation_kwds`` and ``rio_overviews``,
|
||
each with their own semantics.
|
||
|
||
Offsets and Windows
|
||
-------------------
|
||
|
||
Rasterio adds an abstraction for subsets or windows of a raster array that
|
||
GDAL does not have. A window is a pair of tuples, the first of the pair being
|
||
the raster row indexes at which the window starts and stops, the second being
|
||
the column indexes at which the window starts and stops. Row before column,
|
||
as with ``ndarray`` slices. Instances of ``Window`` are created by passing the
|
||
four subset parameters used with ``gdal`` to the class constructor.
|
||
|
||
.. code-block:: python
|
||
|
||
src = rasterio.open('example.tif')
|
||
|
||
xoff, yoff = 0, 0
|
||
xsize, ysize = 10, 10
|
||
subset = src.read(1, window=Window(xoff, yoff, xsize, ysize))
|
||
|
||
Valid Data Masks
|
||
----------------
|
||
|
||
Rasterio provides an array for every dataset representing its valid data mask
|
||
using the same indicators as GDAL: ``0`` for invalid data and ``255`` for valid
|
||
data.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> src = rasterio.open('example.tif')
|
||
>>> src.dataset_mask()
|
||
array([[0, 0, 0, ..., 0, 0, 0],
|
||
[0, 0, 0, ..., 0, 0, 0],
|
||
[0, 0, 0, ..., 0, 0, 0],
|
||
...,
|
||
[0, 0, 0, ..., 0, 0, 0],
|
||
[0, 0, 0, ..., 0, 0, 0],
|
||
[0, 0, 0, ..., 0, 0, 0]], dtype-uint8)
|
||
|
||
Arrays for dataset bands can also be had as a Numpy ``masked_array``.
|
||
|
||
.. code-block:: pycon
|
||
|
||
>>> src.read(1, masked=True)
|
||
masked_array(data =
|
||
[[-- -- -- ..., -- -- --]
|
||
[-- -- -- ..., -- -- --]
|
||
[-- -- -- ..., -- -- --]
|
||
...,
|
||
[-- -- -- ..., -- -- --]
|
||
[-- -- -- ..., -- -- --]
|
||
[-- -- -- ..., -- -- --]],
|
||
mask =
|
||
[[ True True True ..., True True True]
|
||
[ True True True ..., True True True]
|
||
[ True True True ..., True True True]
|
||
...,
|
||
[ True True True ..., True True True]
|
||
[ True True True ..., True True True]
|
||
[ True True True ..., True True True]],
|
||
fill_value = 0)
|
||
|
||
Where the masked array's ``mask`` is ``True``, the data is invalid and has been
|
||
masked "out" in the opposite sense of GDAL's mask.
|
||
|
||
Errors and Exceptions
|
||
---------------------
|
||
|
||
Rasterio always raises Python exceptions when an error occurs and never returns
|
||
an error code or ``None`` to indicate an error. ``gdal`` takes the opposite
|
||
approach, although developers can turn on exceptions by calling
|
||
``gdal.UseExceptions()``.
|