Final Project: Earthquakes in the Bay Area by Jacqueline Zuckut¶

Introduction¶

Question: Focusing on the regions of the San Francisco Bay Area in the San Francisco Peninsula, North East Bay, and South East Bay, along major faults including the San Andreas Fault and Hayward Fault, how unusual is the current gap in major seismic activity based on data from 1800 to present? Does the number of lower magnitude earthquakes in a year reflect the amount of major seismic activity in the region around that time?

Background: For many years, people living in the Bay Area have been asking questions about "the Big One" and when the next major earthquake will occur, some people saying that the next major earthquake is overdue for the Bay Area. When exploring this topic, one relevant report I came across is the U.S. Geological Survey's Probabilities of Large Earthquakes in the San Francisco Bay Area Region, California. This report inspired my analysis, because I found the 30-year probabilities interesting, especially since it has now been over 30 years since the book was published, in 1990. Also, I was looking for a study that focused on a region that is not too large, and this study specifically focused on the San Andreas, Hayward, and Rodgers Creek Fault. In my analysis, I decided to focus on the areas encompassing the San Andreas and Hayward faults. According to the report, there was an approximated 67% probability of an earthquake in the Bay Area of about magnitude 7 sometime from 1990 to 2020 and a 20 to 30% probability of a 7 magnitude earthquake on both San Andreas and Hayward faults (USGS 3).

Approach: I will filter the data from the USGS Earthquake Catalog to focus in on a specific region encompassing the San Francisco Peninsula, North East Bay, and South East Bay for earthquakes with magnitude of at least 5.5 from 1800 to 2023. I have included maps of the region with the earthquakes' locations plotted and maps from the USGS of where faults run. The main faults in the region I am exploring are the San Andreas and Hayward Faults. I will then calculate the previous seismic gaps between major earthquakes, which I define as earthquakes of a magnitude of at least 5.5. I will plot these gaps as a histogram to show the distribution of seismic gaps in our historical period from 1800 to 2023 (not including our current seismic gap as a data point). I will then use a bootstrap confidence interval to estimate the mean seismic gap and compare the current seismic gap, time since the most recent major earthquake, to the mean. Then, I will calculate the Z-score of the current seismic gap compared to the sample mean and standard deviation in order to find the probability of an observation as extreme or more extreme as the current seismic gap. Next, I will look at whether the number of lower magnitude earthquakes in the year has any relation to the number of major earthquakes in that year. I will compile the amount of lower magnitude earthquakes, from a 3 to 5.5 magnitude, by year, from 1974 to present, and plot the total number of earthquakes per year as a line plot, with a subplot being the scatterplot of major earthquakes over time, to see if there is a correlation between an increase in number of lower magnitude earthquakes with a large number of major earthquakes.

In [1]:
import numpy as np
from matplotlib import cm
from cartopy import config
import cartopy.crs as ccrs

import pandas as pd

import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import time
import scipy.stats
from sklearn.neighbors import KernelDensity
In [2]:
start_day = ''
end_day = ''
standard_url = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time'

query_url = standard_url + '&starttime=' + '1800-01-01' + '&endtime=' + '2023-04-29' + '&maxlatitude=38.2'+ '&minlatitude=37' + '&maxlongitude=-121.6'+ '&minlongitude=-122.9'+'&minmagnitude=5.5'
query_url
Out[2]:
'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time&starttime=1800-01-01&endtime=2023-04-29&maxlatitude=38.2&minlatitude=37&maxlongitude=-121.6&minlongitude=-122.9&minmagnitude=5.5'
In [3]:
earthquake_data = pd.read_csv(query_url)
earthquake_data.head()
Out[3]:
time latitude longitude depth mag magType nst gap dmin rms ... updated place type horizontalError depthError magError magNst status locationSource magSource
0 1989-10-18T00:04:15.190Z 37.036167 -121.879833 17.214 6.90 mh 84.0 89.0 0.008108 0.08 ... 2023-02-23T07:58:03.072Z Loma Prieta, California Earthquake earthquake 0.21 0.31 NaN 0.0 reviewed nc nc
1 1986-03-31T11:55:39.810Z 37.479167 -121.686667 8.502 5.70 ml 87.0 65.0 0.027030 0.06 ... 2022-04-27T22:23:44.508Z 15 km NE of East Foothills, California earthquake 0.13 0.41 NaN 6.0 reviewed nc nc
2 1984-04-24T21:15:18.760Z 37.309667 -121.678833 8.193 6.20 ml 101.0 26.0 0.051350 0.06 ... 2022-04-27T23:27:43.358Z 14 km E of Seven Trees, California earthquake 0.11 0.31 NaN 0.0 reviewed nc nc
3 1980-01-24T19:00:09.500Z 37.852000 -121.815000 11.000 5.80 mw NaN NaN NaN NaN ... 2022-04-28T01:48:47.870Z 8 km ENE of Blackhawk, California earthquake NaN NaN NaN NaN reviewed b 01009
4 1957-03-22T19:44:22.270Z 37.557000 -122.724000 15.000 5.66 mw NaN NaN NaN NaN ... 2022-04-26T17:10:06.034Z 18 km W of Montara, California earthquake NaN 1.90 0.65 NaN reviewed iscgem iscgem

5 rows × 22 columns

In [4]:
earthquake_data['time'] = pd.to_datetime(earthquake_data['time'])
earthquake_data.head()
Out[4]:
time latitude longitude depth mag magType nst gap dmin rms ... updated place type horizontalError depthError magError magNst status locationSource magSource
0 1989-10-18 00:04:15.190000+00:00 37.036167 -121.879833 17.214 6.90 mh 84.0 89.0 0.008108 0.08 ... 2023-02-23T07:58:03.072Z Loma Prieta, California Earthquake earthquake 0.21 0.31 NaN 0.0 reviewed nc nc
1 1986-03-31 11:55:39.810000+00:00 37.479167 -121.686667 8.502 5.70 ml 87.0 65.0 0.027030 0.06 ... 2022-04-27T22:23:44.508Z 15 km NE of East Foothills, California earthquake 0.13 0.41 NaN 6.0 reviewed nc nc
2 1984-04-24 21:15:18.760000+00:00 37.309667 -121.678833 8.193 6.20 ml 101.0 26.0 0.051350 0.06 ... 2022-04-27T23:27:43.358Z 14 km E of Seven Trees, California earthquake 0.11 0.31 NaN 0.0 reviewed nc nc
3 1980-01-24 19:00:09.500000+00:00 37.852000 -121.815000 11.000 5.80 mw NaN NaN NaN NaN ... 2022-04-28T01:48:47.870Z 8 km ENE of Blackhawk, California earthquake NaN NaN NaN NaN reviewed b 01009
4 1957-03-22 19:44:22.270000+00:00 37.557000 -122.724000 15.000 5.66 mw NaN NaN NaN NaN ... 2022-04-26T17:10:06.034Z 18 km W of Montara, California earthquake NaN 1.90 0.65 NaN reviewed iscgem iscgem

5 rows × 22 columns

In [5]:
plt.scatter(earthquake_data['time'], earthquake_data['mag'])
plt.xlabel('Year')
plt.ylabel('Magnitude')
plt.title('Major Earthquakes since 1800')
plt.xlim(pd.to_datetime('1800-01-01'), pd.to_datetime('2023-01-01'))
plt.show()
In [6]:
fig = plt.figure(figsize=(7,7))
ax = plt.axes(projection=ccrs.Robinson(-46.0))
ax.set_extent([-123, -121.5, 36.9, 38.3], crs=ccrs.PlateCarree())

plt.scatter(earthquake_data['longitude'],earthquake_data['latitude'],transform=ccrs.PlateCarree(), s = (earthquake_data['mag']-5)*100, color = 'red', marker = 'o', zorder = 100)

plt.title('Earthquakes at least 5.5 Magnitude (1800 to 2023)')    
ax.coastlines()
ax.stock_img()
ax.gridlines()
ax.legend(['Size of point: Relative\nMagnitude'])
plt.show()
In [7]:
import plotly.graph_objects as go

fig = go.Figure(go.Scattermapbox(
    lat = earthquake_data['latitude'], 
    lon = earthquake_data['longitude'],
    mode = 'markers',
    marker=go.scattermapbox.Marker(
            size=(earthquake_data['mag']-5)*10)))

fig.update_layout(
    autosize=True,
    mapbox = {
        'style': "stamen-terrain",
        'center': {'lon': -122.2, 'lat': 37.6 },
        'zoom': 6.8},
    showlegend = False, title = 'Map of Earthquakes at least 5.5 Magnitude (1800 to 2023) in the region of interest (size = Magnitude)')

fig.show()

USGS Map of Faults in the San Francisco Bay Area:¶

UCERF-DEMv17.jpg

Calculate Current Seismic Gap and Compare to Historical Seismic Gaps¶

In [8]:
def seconds_to_years_dec(seconds):
    minutes = seconds/60.0
    hours = minutes/60.0
    days = hours/24.0
    years = days/365.25
    return years
In [9]:
earthquake_data['time_btwn'] = 0.0
for i in range(1, len(earthquake_data)):
    earthquake_data.at[i, 'time_btwn'] = seconds_to_years_dec((np.array(earthquake_data['time'])[i-1] - np.array(earthquake_data['time'])[i]).total_seconds())
In [10]:
earthquake_data.head()
Out[10]:
time latitude longitude depth mag magType nst gap dmin rms ... place type horizontalError depthError magError magNst status locationSource magSource time_btwn
0 1989-10-18 00:04:15.190000+00:00 37.036167 -121.879833 17.214 6.90 mh 84.0 89.0 0.008108 0.08 ... Loma Prieta, California Earthquake earthquake 0.21 0.31 NaN 0.0 reviewed nc nc 0.000000
1 1986-03-31 11:55:39.810000+00:00 37.479167 -121.686667 8.502 5.70 ml 87.0 65.0 0.027030 0.06 ... 15 km NE of East Foothills, California earthquake 0.13 0.41 NaN 6.0 reviewed nc nc 3.549640
2 1984-04-24 21:15:18.760000+00:00 37.309667 -121.678833 8.193 6.20 ml 101.0 26.0 0.051350 0.06 ... 14 km E of Seven Trees, California earthquake 0.11 0.31 NaN 0.0 reviewed nc nc 1.931859
3 1980-01-24 19:00:09.500000+00:00 37.852000 -121.815000 11.000 5.80 mw NaN NaN NaN NaN ... 8 km ENE of Blackhawk, California earthquake NaN NaN NaN NaN reviewed b 01009 4.249401
4 1957-03-22 19:44:22.270000+00:00 37.557000 -122.724000 15.000 5.66 mw NaN NaN NaN NaN ... 18 km W of Montara, California earthquake NaN 1.90 0.65 NaN reviewed iscgem iscgem 22.841805

5 rows × 23 columns

In [11]:
plt.hist(earthquake_data['time_btwn'], bins = 30)
plt.xlabel('Years')
plt.ylabel('Count')
plt.title('Seismic Gaps between Major Earthquakes since 1800')
plt.show()
In [12]:
current_gap = seconds_to_years_dec((datetime.datetime.fromtimestamp(time.time()) - (np.array(earthquake_data['time'])[0])\
                            .tz_localize(None)).total_seconds())

print('Current seismic gap (last major earthquake to present): ', current_gap)
Current seismic gap (last major earthquake to present):  33.56345985848059
In [13]:
sample_means = []

for n in range(0,10000):
    resampled = earthquake_data.sample(n=len(earthquake_data),replace=True)
    mean_time = np.mean(resampled['time_btwn'])
    sample_means.append(mean_time)
In [14]:
plt.figure(figsize=(10,4))
plt.hist(sample_means,alpha=0.5,density=True,color='orange',label='bootstrap resampled seismic gap mean')
plt.axvline(x=np.percentile(sample_means,2.5),linestyle='--',color='orange',label='95% confidence interval on seismic gap mean')
plt.axvline(x=np.percentile(sample_means,97.5),linestyle='--',color='orange')
plt.axvline(x=current_gap,linestyle='-',color='red', label = 'Current Seismic Gap')
plt.legend(loc='lower right')
plt.xlim(0,35)
plt.show()
In [15]:
gaps_mean = np.mean(earthquake_data['time_btwn'])

print('Mean of seismic gaps from 1800 to 2023 (not including present gap): ', gaps_mean)
Mean of seismic gaps from 1800 to 2023 (not including present gap):  5.3330114710784695
In [16]:
print("95% Confidence interval for mean of seimic gaps: ", np.percentile(sample_means,2.5), np.percentile(sample_means,97.5))
95% Confidence interval for mean of seimic gaps:  2.9092289886017473 8.250885968560151
In [17]:
gaps_std = np.std(earthquake_data['time_btwn'])

print('Standard deviation of seismic gaps: ', gaps_std)
Standard deviation of seismic gaps:  7.992422158832153
In [18]:
z_score_current_gap = (current_gap - gaps_mean) / gaps_std
print('Z-score of current gap: ', z_score_current_gap)
Z-score of current gap:  3.5321518090990245
In [19]:
# Want to find probability that a seismic gap would be at least as long as the current gap of roughly 33.552 years.
# To do this find probability that Z is greater than 3.53074
# Using the z-table and Z-Score Probability Calculator (listed under sources), the probability that Z-score
#is greater than 3.53074 is 1 - 0.9998 = 0.0002
# Using a confidence level of 95%, since 0.0002 is less than 0.05, our current seismic gap is very unlikely to occur if the
# distribution of earthquake gaps was normal.

Distribution of Smaller Earthquakes Compared to Major Earthquakes¶

In [20]:
start_day = ''
end_day = ''
standard_url2 = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time'

query_url2 = standard_url2 + '&starttime=' + '1974-01-01' + '&endtime=' + '2023-04-29' + '&maxlatitude=38.2'+ '&minlatitude=37' + '&maxlongitude=-121.6'+ '&minlongitude=-122.9'+'&minmagnitude=3'+'&maxmagnitude=5.5'
query_url2
Out[20]:
'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time&starttime=1974-01-01&endtime=2023-04-29&maxlatitude=38.2&minlatitude=37&maxlongitude=-121.6&minlongitude=-122.9&minmagnitude=3&maxmagnitude=5.5'
In [21]:
earthquakes_low_mag = pd.read_csv(query_url2)
In [22]:
earthquakes_low_mag.head()
Out[22]:
time latitude longitude depth mag magType nst gap dmin rms ... updated place type horizontalError depthError magError magNst status locationSource magSource
0 2023-04-26T07:34:33.370Z 37.286833 -121.657667 6.26 3.10 ml 146.0 31.0 0.03041 0.11 ... 2023-05-02T08:49:20.401Z 16km E of Seven Trees, CA earthquake 0.08 0.18 0.152 109.0 reviewed nc nc
1 2023-04-20T03:35:47.710Z 37.916333 -121.839667 16.04 3.65 mw 207.0 32.0 0.05353 0.26 ... 2023-05-12T02:18:34.983Z 9km ESE of Clayton, CA earthquake 0.15 0.39 NaN 3.0 reviewed nc nc
2 2023-04-01T16:43:48.280Z 37.749167 -122.143333 7.62 3.01 ml 149.0 15.0 0.03141 0.10 ... 2023-04-23T21:08:21.040Z 3km NNE of San Leandro, CA earthquake 0.08 0.15 0.244 107.0 reviewed nc nc
3 2023-04-01T16:24:56.230Z 37.750500 -122.142667 7.58 3.00 ml 147.0 15.0 0.03217 0.10 ... 2023-04-23T21:04:12.040Z 3km NNE of San Leandro, CA earthquake 0.08 0.17 0.252 128.0 reviewed nc nc
4 2023-03-28T13:01:18.360Z 37.617333 -122.477333 9.13 3.40 ml 133.0 69.0 0.01982 0.07 ... 2023-05-12T03:42:22.691Z 1km ENE of Pacifica, CA earthquake 0.12 0.17 0.170 146.0 reviewed nc nc

5 rows × 22 columns

In [23]:
yearly_count_low_mag = (pd.to_datetime(earthquakes_low_mag['time']).dt.year.value_counts()).to_frame()
yearly_count_low_mag = yearly_count_low_mag.reset_index()
yearly_count_low_mag.head()
Out[23]:
index time
0 1989 178
1 1984 60
2 1980 57
3 1986 48
4 1990 38
In [24]:
plt.scatter(yearly_count_low_mag['index'], yearly_count_low_mag['time'])
plt.xlabel('Year')
plt.ylabel('Count')
plt.title('Number of Smaller Magnitude Earthquakes by year (1974-2023)')
Out[24]:
Text(0.5, 1.0, 'Number of Smaller Magnitude Earthquakes by year (1974-2023)')
In [25]:
fig, (ax1, ax2) = plt.subplots(2)
plt.subplots_adjust(hspace = 0.5)

arrow_args = dict(arrowstyle="->")

sns.lineplot(data = yearly_count_low_mag, x = "index", y = "time", ax = ax1)
ax1.margins(x=0)
ax1.set(xlabel = 'Year', ylabel = 'Count', title = 'Smaller Magnitude (3 to 5.5 mag) Earthquakes by Year (1980-2023)')

plt.scatter(earthquake_data['time'], earthquake_data['mag'])
ax2.annotate('1989 Loma Prieta Earthquake', xy=(pd.to_datetime('1989-10-18'), 6.90), xycoords='data',
             xytext=(20,-5), textcoords='offset points',
             arrowprops=arrow_args)

plt.xlim(pd.to_datetime('1974-01-01'), pd.to_datetime('2023-01-01'))
plt.ylim(0,8)
plt.xlabel('Year')
plt.ylabel('Magnitude')
plt.title('Major Earthquakes by year (1980-2023)')
plt.show()

Conclusions¶

For the geological region that I chose to focus on, based on the data I collected and filtered for, there has not been a major earthquake, of magnitude at least 5.5, in many years. After calculating the seismic gaps between earthquakes from 1800 to 2023, I graphed their distribution and calculated the current seismic gap. I was surprised by how large the current seismic gap is, at roughly 33.55 years, with the last earthquake over a magnitude 5.5 being the 1989 Loma Prieta Earthquake. Based on the histogram I plotted, the current seismic gap is larger than any of the seismic gaps since 1800. Most seismic gaps were between 0 and 6 years with some outliers, including seismic gaps of up to 30 years. To estimate the average estimated mean seismic gap for earthquakes in the region from 1800 to 2023, I used a 95% confidence interval using bootstrap resampling. My initial sample mean was 5.333 and my 95% confidence interval estimates the mean of the seismic gap to be between 2.833 and 8.164 years. The mean of the seismic gaps gives a benchmark of a typical period between earthquakes, which makes me believe that our current seismic gap is definitely unusual. In order to test the actual likelihood of the current gap based on the mean, I used a Z-score to calculate the probability of a seismic gap as extreme or more extreme than our current one to be observed based on a normal distribution with the mean and standard deviation of the data. I found a Z-score of 3.53, which translates to the probability of an observation as extreme or more extreme as our current seismic gap being about 0.0002 = 0.02%. Our current seismic gap is very unlikely to occur based on our data, so it does seem like the current gap in major seismic activity based on data from 1800 to present in the San Francisco Peninsula, South Bay, and East Bay, is unusual.

When looking into the relationship between the number of smaller earthquakes per year and whether a large earthquake had occurred, I had to first decide which magnitudes to count under smaller earthquakes. In order to not have too many data points, I chose earthquakes with magnitudes between 3 and 5.5 as smaller earthquakes. One thing I noticed is that since I wanted to compare the total number of smaller earthquakes by year, smaller earthquakes only started being recorded consistently from around 1974 to the present. Based on the graph of the number of smaller earthquakes per year compared to the major earthquakes from 1974 to the present, the major earthquakes line up very closely with the years when there were the most smaller earthquakes. For example, during 1989, the year of the major Loma Prieta earthquake, there was a very large spike in the number of smaller earthquakes that year, more than twice as many smaller magnitude earthquakes that year than any other year from 1974 to 2022. Considering that there were 4 years with major earthquakes from 1974 to 2022 and each of these years corresponded to one of the 4 highest numbers of smaller earthquakes, I am confident in stating that there is a strong correlation between whether or not there was a major earthquake in a year and how many smaller (3 to 5.5 mag) earthquakes there were that year.

One thing that I could investigate further is researching the history of the specific plates’ activity more. For example, the San Andreas Fault extends beyond the area that I studied, and it could be that there has been more activity in other parts of the fault that I did not include in my project. Also, I could look more into whether 5.5 is a good threshold for a major earthquake that actually impacts the community and could be dangerous. In addition, it may be unreliable to filter data in the way that I did, because I found one case where an earthquake of about 5.5 magnitude was filtered out, an earthquake in 2007 near East Foothills, CA, which is in the area of interest, that had a magnitude at the origin of 5.5 and a magnitude of Moment Tensor of 5.4. It would be helpful to understand why this and other earthquakes were filtered out and whether these edge-cases are worth including in the dataset. Finally, to expand further on the project, it would be interesting to research whether high numbers of smaller earthquakes cause large earthquakes or if major earthquakes cause increased seismic activity afterwards.

Acknowledgements/Works Cited:

James R. Holliday, Chien-chih Chen, Kristy F. Tiampo, John B. Rundle, Donald L. Turcotte, Andrea Donnellan; A RELM Earthquake Forecast Based on Pattern Informatics. Seismological Research Letters 2007;; 78 (1): 87–93. doi: https://doi.org/10.1785/gssrl.78.1.87

“Latest Earthquakes.” Earthquakes | U.S. Geological Survey, https://earthquake.usgs.gov/earthquakes/map/?extent=36.83347,-122.94525&extent=38.08701,-120.07233&range=search&timeZone=utc&search=%7B%22name%22:%22Search%20Results%22,%22params%22:%7B%22starttime%22:%221800-01-01%2000:00:00%22,%22endtime%22:%222023-04-29%2023:59:59%22,%22maxlatitude%22:38.2,%22minlatitude%22:37,%22maxlongitude%22:-121,%22minlongitude%22:-122.9,%22minmagnitude%22:5.5,%22orderby%22:%22time%22%7D%7D.

“Map of Known Active Geologic Faults in the San Francisco Bay Region.” Map of Known Active Geologic Faults in the San Francisco Bay Region | U.S. Geological Survey, www.usgs.gov/media/images/map-known-active-geologic-faults-san-francisco-bay-region.

Probabilities of Large Earthquakes in the San Francisco Bay Region, California. United States, Department of the Interior, U.S. Geological Survey, 1990. https://www.google.com/books/edition/Probabilities_of_Large_Earthquakes_in_th/aguKoa2cXgQC?hl=en&gbpv=0.

“Z Table and Z Score Calculation.” Z SCORE TABLE, www.z-table.com/.

In [ ]: