Data Science Projects


Fine Scale Weather Data from 1900-2013


Project Link:

Overview

Standard measures of weather include minimum, maximum, and average temperatures for a given day, month, or year. However, throughout a day the temperature rises and falls, which standard measures are not able to capture due to the linear nature of the measurement. To estimate the nonlinear relationship in a day, I utilize an integrated sine approach to calculate degree days and time in each degree. These measures provide better covariates for explaining the effect of weather on various estimators (crop yield).

Degree days and time in each degree provides a measure of looking at temperature effects during the day. Degree days can be defined as the amount of time during a day that the environment is exposed to a threshold of temperature. For example, suppose for half of the day it is 30C. A simple calculation of degree days above 25°C would involve 5 degrees for half of a day, so for that particular day the degree days above 25°C would be 2.5. The exposure during the day becomes longer as it becomes warmer or shorter if it is cooler. Time in each degree is similiar in definition, but instead each degree gets a time interval. Time in each degree is defined as the length of heat exposure to each one-degree Celsisus temperature interval in each day. (See Synyder 1985)

Daily minimum and maximum weather data is needed in order to calculate these two variables. More importantly, gridded data (2.5 x 2.5 miles) provides an accurate relationship to regions and are generally thought of as more precise than using zip codes. This data is not available for the suggested time period so an interpolation technique is used to get to daily data using monthly data. Calculations of degree days and time in each degree can then be extracted using the necessary mathmatical technique.

Using daily data available from the National Climatic Data Center (NCDC: http://www.ncdc.noaa.gov/) for various weather stations and PRISM (http://www.prism.oregonstate.edu/) data that provides monthly data on a grid a relatively anomaly spline interpolation technique is used to build fine scale weather data on a 2.5 x 2.5 mile grid scale.

The main idea behind building gridded data is to use the monthly temperatures from PRISM, place them at the midpoint (15th day) of each month for each year, and run a spline through each midpoint for each month of each day for each grid. This will in turn produce fine scale gridded daily data. Next, find the relative anamoly where R = NCDC(month) / PRISM(month). Finally, use Inverse Distance Weighting (IDW) to find the 5 closest NCDC stations for each PRISM grid to weight the temperature changes and apply the relative anamoly to this weighted temperature. This will produce gridded 2.5 x 2.5 mile daily data that is between the PRISM and NCDC data. For a simplegit example in R see Interpolation Technique




R-package: nonlineartempr


Project Link:

Overview

nonlineartempr calculates nonlinear temperature distributions using an integrated sine technique. Degree days define time above a specified temperature threshold (e.g. degree days above 30C) and time in each degree define time within a specified temperature threshold (e.g. time in 30C).

Installation

# Install through devtools:install_github()

# install.packages("devtools")
devtools::install_github("johnwoodill/nonlineartempr")

Usage

library(nonlineartempr)

# Load Napa County, CA data
data(napa)

# Degree Days
dd_napa <- degree_days(napa, thresholds = c(0:35))
head(dd_napa)
        date year month day fips      county state  lat   long    tmax   tmin    tavg  dday0C  dday1C
#> 1 1899-12-15 1899    12  15 6055 Napa County    CA 38.5 -122.5 12.5000 4.1100 8.30500 8.30500 7.30500
#> 2 1899-12-16 1899    12  16 6055 Napa County    CA 38.5 -122.5 12.5445 4.2894 8.41695 8.41695 7.41695
#> 3 1899-12-17 1899    12  17 6055 Napa County    CA 38.5 -122.5 12.5878 4.4574 8.52260 8.52260 7.52260
#> 4 1899-12-18 1899    12  18 6055 Napa County    CA 38.5 -122.5 12.6298 4.6144 8.62210 8.62210 7.62210
#> 5 1899-12-19 1899    12  19 6055 Napa County    CA 38.5 -122.5 12.6706 4.7604 8.71550 8.71550 7.71550
#> 6 1899-12-20 1899    12  20 6055 Napa County    CA 38.5 -122.5 12.7103 4.8959 8.80310 8.80310 7.80310
#>    dday2C  dday3C  dday4C  dday5C  dday6C  dday7C  dday8C  dday9C dday10C dday11C dday12C dday13C
#> 1 6.30500 5.30500 4.30500 3.42938 2.69499 2.05296 1.49134 1.00618 0.59837 0.27430 0.05212       0
#> 2 6.41695 5.41695 4.41695 3.50622 2.75479 2.10052 1.52902 1.03544 0.62019 0.28919 0.05975       0
#> 3 6.52260 5.52260 4.52260 3.58250 2.81355 2.14717 1.56600 1.06423 0.64173 0.30402 0.06757       0
#> 4 6.62210 5.62210 4.62210 3.65817 2.87108 2.19273 1.60214 1.09242 0.66291 0.31871 0.07553       0
#> 5 6.71550 5.71550 4.71550 3.73325 2.92721 2.23709 1.63735 1.11996 0.68369 0.33324 0.08359       0
#> 6 6.80310 5.80310 4.80310 3.80821 2.98185 2.28020 1.67162 1.14683 0.70407 0.34760 0.09174       0
#> ...

# Time in each degree
td_napa <- degree_time(napa, thresholds = c(0:35))
head(td_napa)

#>         date year month day fips      county state  lat   long    tmax   tmin tdegree0C tdegree1C
#> 1 1899-12-15 1899    12  15 6055 Napa County    CA 38.5 -122.5 12.5000 4.1100         0         0
#> 2 1899-12-16 1899    12  16 6055 Napa County    CA 38.5 -122.5 12.5445 4.2894         0         0
#> 3 1899-12-17 1899    12  17 6055 Napa County    CA 38.5 -122.5 12.5878 4.4574         0         0
#> 4 1899-12-18 1899    12  18 6055 Napa County    CA 38.5 -122.5 12.6298 4.6144         0         0
#> 5 1899-12-19 1899    12  19 6055 Napa County    CA 38.5 -122.5 12.6706 4.7604         0         0
#> 6 1899-12-20 1899    12  20 6055 Napa County    CA 38.5 -122.5 12.7103 4.8959         0         0
#>   tdegree2C tdegree3C tdegree4C tdegree5C tdegree6C tdegree7C tdegree8C tdegree9C tdegree10C tdegree11C
#> 1         0         0   0.02960   0.07636   0.07794   0.08090   0.08576   0.09359    0.10704    0.13481
#> 2         0         0   0.01624   0.07745   0.07889   0.08172   0.08645   0.09412    0.10727    0.13413
#> 3         0         0   0.00334   0.07853   0.07981   0.08251   0.08711   0.09462    0.10749    0.13350
#> 4         0         0   0.00000   0.07048   0.08070   0.08327   0.08774   0.09509    0.10769    0.13292
#> 5         0         0   0.00000   0.05961   0.08155   0.08399   0.08833   0.09553    0.10786    0.13237
#> 6         0         0   0.00000   0.04926   0.08235   0.08467   0.08888   0.09593    0.10801    0.13183
#> ...

python-package: nonlineartemppy


nonlineartemppy

Overview

nonlineartemppy calculates nonlinear temperature distributions using an integrated sine technique. Degree days define time above a specified temperature threshold (e.g. degree days above 30C) and time in each degree define time within a specified temperature threshold (e.g. time in 30C).

Installation

# Install through pip
pip install nonlineartemppy

Usage

import numpy as np
import pandas as pd
import nonlineartemppy.calculations as nltemp
import io
import pkgutil

# Import Napa data set
data = pkgutil.get_data('nonlineartemppy', 'data/napa.csv')
napa = pd.read_csv(io.BytesIO(data), encoding='utf8', sep=",", dtype={"switch": np.int8})
napa = pd.DataFrame(napa)

# Degree Days
dd_napa = nltemp.degree_days(napa, range(0,5))

dd_napa.head()

         date    year  month  day  fips       county state   lat   long  \
0  1899-12-15  1899.0     12   15  6055  Napa County    CA  38.5 -122.5   
1  1899-12-16  1899.0     12   16  6055  Napa County    CA  38.5 -122.5   
2  1899-12-17  1899.0     12   17  6055  Napa County    CA  38.5 -122.5   
3  1899-12-18  1899.0     12   18  6055  Napa County    CA  38.5 -122.5   
4  1899-12-19  1899.0     12   19  6055  Napa County    CA  38.5 -122.5   

      tmax    tmin     tavg   dday0C   dday1C   dday2C   dday3C   dday4C  
0  12.5000  4.1100  8.30500  8.30500  7.30500  6.30500  5.30500  4.30500  
1  12.5445  4.2894  8.41695  8.41695  7.41695  6.41695  5.41695  4.41695  
2  12.5878  4.4574  8.52260  8.52260  7.52260  6.52260  5.52260  4.52260  
3  12.6298  4.6144  8.62210  8.62210  7.62210  6.62210  5.62210  4.62210  
4  12.6706  4.7604  8.71550  8.71550  7.71550  6.71550  5.71550  4.71550  


# Time in each degree
td_napa = nltemp.degree_time(napa, range(0,5))
td_napa.head()

         date    year  month  day  fips       county state   lat   long  \
0  1899-12-15  1899.0     12   15  6055  Napa County    CA  38.5 -122.5   
1  1899-12-16  1899.0     12   16  6055  Napa County    CA  38.5 -122.5   
2  1899-12-17  1899.0     12   17  6055  Napa County    CA  38.5 -122.5   
3  1899-12-18  1899.0     12   18  6055  Napa County    CA  38.5 -122.5   
4  1899-12-19  1899.0     12   19  6055  Napa County    CA  38.5 -122.5   

      tmax    tmin  tdegree0C  tdegree1C  tdegree2C  tdegree3C  tdegree4C  
0  12.5000  4.1100        0.0        0.0        0.0        0.0    0.02960  
1  12.5445  4.2894        0.0        0.0        0.0        0.0    0.01624  
2  12.5878  4.4574        0.0        0.0        0.0        0.0    0.00334  
3  12.6298  4.6144        0.0        0.0        0.0        0.0    0.00000  
4  12.6706  4.7604        0.0        0.0        0.0        0.0    0.00000