{ "metadata": { "name": "", "signature": "sha256:5a6908bb26106597e34dd231b1a5f453aaa6e8a3e4c9298d8c3baaf3c3e0c4a1" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An introduction to Rasterio\n", "\n", "The smallest interesting problems [1] addressed by Rasterio are reading raster data from files as [Numpy](http://www.numpy.org/) arrays and writing such arrays back to files. In between, you can use the world of scientific python software to analyze and process the data. Rasterio also provides a few operations that are described in the next notebooks in this series.\n", "\n", "This notebook demonstrates the basics of reading and writing raster data with Rasterio.\n", "\n", "## Overview of a dataset\n", "\n", "A raster dataset consists of one or more dense (as opposed to sparse) 2-D arrays of scalar values. An RGB TIFF image file is a good example of a raster dataset. It has 3 bands (or channels \u2013 we'll call them bands here) and each has a number of rows (its `height`) and columns (its `width`) and a uniform data type (unsigned 8-bit integers, 64-bit floats, etc). Geospatially referenced datasets will also possess a mapping from image to world coordinates (a `transform`) in a specific coordinate reference system (`crs`). This metadata about a dataset is readily accessible using Rasterio.\n", "\n", "The Scientific Python community often imports numpy as `np`. Do this and also import rasterio." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "\n", "import rasterio" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rasterio uses for many of its tests a small 3-band GeoTIFF file named \"RGB.byte.tif\". Open it using the function `rasterio.open()`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src = rasterio.open('../tests/data/RGB.byte.tif')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function returns a dataset object. It has many of the same properties as a Python file object." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src.name" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "'../tests/data/RGB.byte.tif'" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "src.mode" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "'r'" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "src.closed" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "False" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Raster datasets have additional structure and a description can be had from its `meta` property or individually." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src.meta" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "{'affine': Affine(300.0379266750948, 0.0, 101985.0,\n", " 0.0, -300.041782729805, 2826915.0),\n", " 'count': 3,\n", " 'crs': {'init': u'epsg:32618'},\n", " 'driver': u'GTiff',\n", " 'dtype': 'uint8',\n", " 'height': 718,\n", " 'nodata': 0.0,\n", " 'transform': (101985.0,\n", " 300.0379266750948,\n", " 0.0,\n", " 2826915.0,\n", " 0.0,\n", " -300.041782729805),\n", " 'width': 791}" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "src.crs" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "{'init': u'epsg:32618'}" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To close an opened dataset, use its `close()` method." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src.close()\n", "src.closed" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "True" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can't read from or write to a closed dataset, but you can continue access its properties." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src.driver" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "u'GTiff'" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset layout\n", "\n", "Three properties of a Rasterio dataset tell you a lot about it in Numpy terms. The `shape` of a dataset is a `height, width` tuple and is exactly the shape of Numpy arrays that would be read from it. The testing dataset has 718 rows and 791 columns." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src.shape" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "(718, 791)" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `count` of bands in the dataset is 3." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src.count" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "3" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "All three of its bands contain 8-bit unsigned integers." ] }, { "cell_type": "code", "collapsed": false, "input": [ "src.dtypes" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 28, "text": [ "['uint8', 'uint8', 'uint8']" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy concepts are the model here. If you wanted to create a 3-D Numpy array into which the testing data file's bands would fit without any resampling, you would use the following Python code." ] }, { "cell_type": "code", "collapsed": false, "input": [ "dest = np.empty((src.count,) + src.shape, dtype='uint8')\n", "dest" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 25, "text": [ "array([[[0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " ..., \n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0]],\n", "\n", " [[0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " ..., \n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0]],\n", "\n", " [[0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " ..., \n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0],\n", " [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[1]: Mike Bostock's words from his FOSS4G keynote, 2014-09-10" ] } ], "metadata": {} } ] }