Question: Focusing on the regions of the San Francisco Bay Area in the San Francisco Peninsula, North East Bay, and South East Bay, along major faults including the San Andreas Fault and Hayward Fault, how unusual is the current gap in major seismic activity based on data from 1800 to present? Does the number of lower magnitude earthquakes in a year reflect the amount of major seismic activity in the region around that time?
Background: For many years, people living in the Bay Area have been asking questions about "the Big One" and when the next major earthquake will occur, some people saying that the next major earthquake is overdue for the Bay Area. When exploring this topic, one relevant report I came across is the U.S. Geological Survey's Probabilities of Large Earthquakes in the San Francisco Bay Area Region, California. This report inspired my analysis, because I found the 30-year probabilities interesting, especially since it has now been over 30 years since the book was published, in 1990. Also, I was looking for a study that focused on a region that is not too large, and this study specifically focused on the San Andreas, Hayward, and Rodgers Creek Fault. In my analysis, I decided to focus on the areas encompassing the San Andreas and Hayward faults. According to the report, there was an approximated 67% probability of an earthquake in the Bay Area of about magnitude 7 sometime from 1990 to 2020 and a 20 to 30% probability of a 7 magnitude earthquake on both San Andreas and Hayward faults (USGS 3).
Approach: I will filter the data from the USGS Earthquake Catalog to focus in on a specific region encompassing the San Francisco Peninsula, North East Bay, and South East Bay for earthquakes with magnitude of at least 5.5 from 1800 to 2023. I have included maps of the region with the earthquakes' locations plotted and maps from the USGS of where faults run. The main faults in the region I am exploring are the San Andreas and Hayward Faults. I will then calculate the previous seismic gaps between major earthquakes, which I define as earthquakes of a magnitude of at least 5.5. I will plot these gaps as a histogram to show the distribution of seismic gaps in our historical period from 1800 to 2023 (not including our current seismic gap as a data point). I will then use a bootstrap confidence interval to estimate the mean seismic gap and compare the current seismic gap, time since the most recent major earthquake, to the mean. Then, I will calculate the Z-score of the current seismic gap compared to the sample mean and standard deviation in order to find the probability of an observation as extreme or more extreme as the current seismic gap. Next, I will look at whether the number of lower magnitude earthquakes in the year has any relation to the number of major earthquakes in that year. I will compile the amount of lower magnitude earthquakes, from a 3 to 5.5 magnitude, by year, from 1974 to present, and plot the total number of earthquakes per year as a line plot, with a subplot being the scatterplot of major earthquakes over time, to see if there is a correlation between an increase in number of lower magnitude earthquakes with a large number of major earthquakes.
import numpy as np
from matplotlib import cm
from cartopy import config
import cartopy.crs as ccrs
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import time
import scipy.stats
from sklearn.neighbors import KernelDensity
start_day = ''
end_day = ''
standard_url = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time'
query_url = standard_url + '&starttime=' + '1800-01-01' + '&endtime=' + '2023-04-29' + '&maxlatitude=38.2'+ '&minlatitude=37' + '&maxlongitude=-121.6'+ '&minlongitude=-122.9'+'&minmagnitude=5.5'
query_url
'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time&starttime=1800-01-01&endtime=2023-04-29&maxlatitude=38.2&minlatitude=37&maxlongitude=-121.6&minlongitude=-122.9&minmagnitude=5.5'
earthquake_data = pd.read_csv(query_url)
earthquake_data.head()
time | latitude | longitude | depth | mag | magType | nst | gap | dmin | rms | ... | updated | place | type | horizontalError | depthError | magError | magNst | status | locationSource | magSource | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1989-10-18T00:04:15.190Z | 37.036167 | -121.879833 | 17.214 | 6.90 | mh | 84.0 | 89.0 | 0.008108 | 0.08 | ... | 2023-02-23T07:58:03.072Z | Loma Prieta, California Earthquake | earthquake | 0.21 | 0.31 | NaN | 0.0 | reviewed | nc | nc |
1 | 1986-03-31T11:55:39.810Z | 37.479167 | -121.686667 | 8.502 | 5.70 | ml | 87.0 | 65.0 | 0.027030 | 0.06 | ... | 2022-04-27T22:23:44.508Z | 15 km NE of East Foothills, California | earthquake | 0.13 | 0.41 | NaN | 6.0 | reviewed | nc | nc |
2 | 1984-04-24T21:15:18.760Z | 37.309667 | -121.678833 | 8.193 | 6.20 | ml | 101.0 | 26.0 | 0.051350 | 0.06 | ... | 2022-04-27T23:27:43.358Z | 14 km E of Seven Trees, California | earthquake | 0.11 | 0.31 | NaN | 0.0 | reviewed | nc | nc |
3 | 1980-01-24T19:00:09.500Z | 37.852000 | -121.815000 | 11.000 | 5.80 | mw | NaN | NaN | NaN | NaN | ... | 2022-04-28T01:48:47.870Z | 8 km ENE of Blackhawk, California | earthquake | NaN | NaN | NaN | NaN | reviewed | b | 01009 |
4 | 1957-03-22T19:44:22.270Z | 37.557000 | -122.724000 | 15.000 | 5.66 | mw | NaN | NaN | NaN | NaN | ... | 2022-04-26T17:10:06.034Z | 18 km W of Montara, California | earthquake | NaN | 1.90 | 0.65 | NaN | reviewed | iscgem | iscgem |
5 rows × 22 columns
earthquake_data['time'] = pd.to_datetime(earthquake_data['time'])
earthquake_data.head()
time | latitude | longitude | depth | mag | magType | nst | gap | dmin | rms | ... | updated | place | type | horizontalError | depthError | magError | magNst | status | locationSource | magSource | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1989-10-18 00:04:15.190000+00:00 | 37.036167 | -121.879833 | 17.214 | 6.90 | mh | 84.0 | 89.0 | 0.008108 | 0.08 | ... | 2023-02-23T07:58:03.072Z | Loma Prieta, California Earthquake | earthquake | 0.21 | 0.31 | NaN | 0.0 | reviewed | nc | nc |
1 | 1986-03-31 11:55:39.810000+00:00 | 37.479167 | -121.686667 | 8.502 | 5.70 | ml | 87.0 | 65.0 | 0.027030 | 0.06 | ... | 2022-04-27T22:23:44.508Z | 15 km NE of East Foothills, California | earthquake | 0.13 | 0.41 | NaN | 6.0 | reviewed | nc | nc |
2 | 1984-04-24 21:15:18.760000+00:00 | 37.309667 | -121.678833 | 8.193 | 6.20 | ml | 101.0 | 26.0 | 0.051350 | 0.06 | ... | 2022-04-27T23:27:43.358Z | 14 km E of Seven Trees, California | earthquake | 0.11 | 0.31 | NaN | 0.0 | reviewed | nc | nc |
3 | 1980-01-24 19:00:09.500000+00:00 | 37.852000 | -121.815000 | 11.000 | 5.80 | mw | NaN | NaN | NaN | NaN | ... | 2022-04-28T01:48:47.870Z | 8 km ENE of Blackhawk, California | earthquake | NaN | NaN | NaN | NaN | reviewed | b | 01009 |
4 | 1957-03-22 19:44:22.270000+00:00 | 37.557000 | -122.724000 | 15.000 | 5.66 | mw | NaN | NaN | NaN | NaN | ... | 2022-04-26T17:10:06.034Z | 18 km W of Montara, California | earthquake | NaN | 1.90 | 0.65 | NaN | reviewed | iscgem | iscgem |
5 rows × 22 columns
plt.scatter(earthquake_data['time'], earthquake_data['mag'])
plt.xlabel('Year')
plt.ylabel('Magnitude')
plt.title('Major Earthquakes since 1800')
plt.xlim(pd.to_datetime('1800-01-01'), pd.to_datetime('2023-01-01'))
plt.show()
fig = plt.figure(figsize=(7,7))
ax = plt.axes(projection=ccrs.Robinson(-46.0))
ax.set_extent([-123, -121.5, 36.9, 38.3], crs=ccrs.PlateCarree())
plt.scatter(earthquake_data['longitude'],earthquake_data['latitude'],transform=ccrs.PlateCarree(), s = (earthquake_data['mag']-5)*100, color = 'red', marker = 'o', zorder = 100)
plt.title('Earthquakes at least 5.5 Magnitude (1800 to 2023)')
ax.coastlines()
ax.stock_img()
ax.gridlines()
ax.legend(['Size of point: Relative\nMagnitude'])
plt.show()
import plotly.graph_objects as go
fig = go.Figure(go.Scattermapbox(
lat = earthquake_data['latitude'],
lon = earthquake_data['longitude'],
mode = 'markers',
marker=go.scattermapbox.Marker(
size=(earthquake_data['mag']-5)*10)))
fig.update_layout(
autosize=True,
mapbox = {
'style': "stamen-terrain",
'center': {'lon': -122.2, 'lat': 37.6 },
'zoom': 6.8},
showlegend = False, title = 'Map of Earthquakes at least 5.5 Magnitude (1800 to 2023) in the region of interest (size = Magnitude)')
fig.show()
def seconds_to_years_dec(seconds):
minutes = seconds/60.0
hours = minutes/60.0
days = hours/24.0
years = days/365.25
return years
earthquake_data['time_btwn'] = 0.0
for i in range(1, len(earthquake_data)):
earthquake_data.at[i, 'time_btwn'] = seconds_to_years_dec((np.array(earthquake_data['time'])[i-1] - np.array(earthquake_data['time'])[i]).total_seconds())
earthquake_data.head()
time | latitude | longitude | depth | mag | magType | nst | gap | dmin | rms | ... | place | type | horizontalError | depthError | magError | magNst | status | locationSource | magSource | time_btwn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1989-10-18 00:04:15.190000+00:00 | 37.036167 | -121.879833 | 17.214 | 6.90 | mh | 84.0 | 89.0 | 0.008108 | 0.08 | ... | Loma Prieta, California Earthquake | earthquake | 0.21 | 0.31 | NaN | 0.0 | reviewed | nc | nc | 0.000000 |
1 | 1986-03-31 11:55:39.810000+00:00 | 37.479167 | -121.686667 | 8.502 | 5.70 | ml | 87.0 | 65.0 | 0.027030 | 0.06 | ... | 15 km NE of East Foothills, California | earthquake | 0.13 | 0.41 | NaN | 6.0 | reviewed | nc | nc | 3.549640 |
2 | 1984-04-24 21:15:18.760000+00:00 | 37.309667 | -121.678833 | 8.193 | 6.20 | ml | 101.0 | 26.0 | 0.051350 | 0.06 | ... | 14 km E of Seven Trees, California | earthquake | 0.11 | 0.31 | NaN | 0.0 | reviewed | nc | nc | 1.931859 |
3 | 1980-01-24 19:00:09.500000+00:00 | 37.852000 | -121.815000 | 11.000 | 5.80 | mw | NaN | NaN | NaN | NaN | ... | 8 km ENE of Blackhawk, California | earthquake | NaN | NaN | NaN | NaN | reviewed | b | 01009 | 4.249401 |
4 | 1957-03-22 19:44:22.270000+00:00 | 37.557000 | -122.724000 | 15.000 | 5.66 | mw | NaN | NaN | NaN | NaN | ... | 18 km W of Montara, California | earthquake | NaN | 1.90 | 0.65 | NaN | reviewed | iscgem | iscgem | 22.841805 |
5 rows × 23 columns
plt.hist(earthquake_data['time_btwn'], bins = 30)
plt.xlabel('Years')
plt.ylabel('Count')
plt.title('Seismic Gaps between Major Earthquakes since 1800')
plt.show()
current_gap = seconds_to_years_dec((datetime.datetime.fromtimestamp(time.time()) - (np.array(earthquake_data['time'])[0])\
.tz_localize(None)).total_seconds())
print('Current seismic gap (last major earthquake to present): ', current_gap)
Current seismic gap (last major earthquake to present): 33.56345985848059
sample_means = []
for n in range(0,10000):
resampled = earthquake_data.sample(n=len(earthquake_data),replace=True)
mean_time = np.mean(resampled['time_btwn'])
sample_means.append(mean_time)
plt.figure(figsize=(10,4))
plt.hist(sample_means,alpha=0.5,density=True,color='orange',label='bootstrap resampled seismic gap mean')
plt.axvline(x=np.percentile(sample_means,2.5),linestyle='--',color='orange',label='95% confidence interval on seismic gap mean')
plt.axvline(x=np.percentile(sample_means,97.5),linestyle='--',color='orange')
plt.axvline(x=current_gap,linestyle='-',color='red', label = 'Current Seismic Gap')
plt.legend(loc='lower right')
plt.xlim(0,35)
plt.show()
gaps_mean = np.mean(earthquake_data['time_btwn'])
print('Mean of seismic gaps from 1800 to 2023 (not including present gap): ', gaps_mean)
Mean of seismic gaps from 1800 to 2023 (not including present gap): 5.3330114710784695
print("95% Confidence interval for mean of seimic gaps: ", np.percentile(sample_means,2.5), np.percentile(sample_means,97.5))
95% Confidence interval for mean of seimic gaps: 2.9092289886017473 8.250885968560151
gaps_std = np.std(earthquake_data['time_btwn'])
print('Standard deviation of seismic gaps: ', gaps_std)
Standard deviation of seismic gaps: 7.992422158832153
z_score_current_gap = (current_gap - gaps_mean) / gaps_std
print('Z-score of current gap: ', z_score_current_gap)
Z-score of current gap: 3.5321518090990245
# Want to find probability that a seismic gap would be at least as long as the current gap of roughly 33.552 years.
# To do this find probability that Z is greater than 3.53074
# Using the z-table and Z-Score Probability Calculator (listed under sources), the probability that Z-score
#is greater than 3.53074 is 1 - 0.9998 = 0.0002
# Using a confidence level of 95%, since 0.0002 is less than 0.05, our current seismic gap is very unlikely to occur if the
# distribution of earthquake gaps was normal.
start_day = ''
end_day = ''
standard_url2 = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time'
query_url2 = standard_url2 + '&starttime=' + '1974-01-01' + '&endtime=' + '2023-04-29' + '&maxlatitude=38.2'+ '&minlatitude=37' + '&maxlongitude=-121.6'+ '&minlongitude=-122.9'+'&minmagnitude=3'+'&maxmagnitude=5.5'
query_url2
'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&orderby=time&starttime=1974-01-01&endtime=2023-04-29&maxlatitude=38.2&minlatitude=37&maxlongitude=-121.6&minlongitude=-122.9&minmagnitude=3&maxmagnitude=5.5'
earthquakes_low_mag = pd.read_csv(query_url2)
earthquakes_low_mag.head()
time | latitude | longitude | depth | mag | magType | nst | gap | dmin | rms | ... | updated | place | type | horizontalError | depthError | magError | magNst | status | locationSource | magSource | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2023-04-26T07:34:33.370Z | 37.286833 | -121.657667 | 6.26 | 3.10 | ml | 146.0 | 31.0 | 0.03041 | 0.11 | ... | 2023-05-02T08:49:20.401Z | 16km E of Seven Trees, CA | earthquake | 0.08 | 0.18 | 0.152 | 109.0 | reviewed | nc | nc |
1 | 2023-04-20T03:35:47.710Z | 37.916333 | -121.839667 | 16.04 | 3.65 | mw | 207.0 | 32.0 | 0.05353 | 0.26 | ... | 2023-05-12T02:18:34.983Z | 9km ESE of Clayton, CA | earthquake | 0.15 | 0.39 | NaN | 3.0 | reviewed | nc | nc |
2 | 2023-04-01T16:43:48.280Z | 37.749167 | -122.143333 | 7.62 | 3.01 | ml | 149.0 | 15.0 | 0.03141 | 0.10 | ... | 2023-04-23T21:08:21.040Z | 3km NNE of San Leandro, CA | earthquake | 0.08 | 0.15 | 0.244 | 107.0 | reviewed | nc | nc |
3 | 2023-04-01T16:24:56.230Z | 37.750500 | -122.142667 | 7.58 | 3.00 | ml | 147.0 | 15.0 | 0.03217 | 0.10 | ... | 2023-04-23T21:04:12.040Z | 3km NNE of San Leandro, CA | earthquake | 0.08 | 0.17 | 0.252 | 128.0 | reviewed | nc | nc |
4 | 2023-03-28T13:01:18.360Z | 37.617333 | -122.477333 | 9.13 | 3.40 | ml | 133.0 | 69.0 | 0.01982 | 0.07 | ... | 2023-05-12T03:42:22.691Z | 1km ENE of Pacifica, CA | earthquake | 0.12 | 0.17 | 0.170 | 146.0 | reviewed | nc | nc |
5 rows × 22 columns
yearly_count_low_mag = (pd.to_datetime(earthquakes_low_mag['time']).dt.year.value_counts()).to_frame()
yearly_count_low_mag = yearly_count_low_mag.reset_index()
yearly_count_low_mag.head()
index | time | |
---|---|---|
0 | 1989 | 178 |
1 | 1984 | 60 |
2 | 1980 | 57 |
3 | 1986 | 48 |
4 | 1990 | 38 |
plt.scatter(yearly_count_low_mag['index'], yearly_count_low_mag['time'])
plt.xlabel('Year')
plt.ylabel('Count')
plt.title('Number of Smaller Magnitude Earthquakes by year (1974-2023)')
Text(0.5, 1.0, 'Number of Smaller Magnitude Earthquakes by year (1974-2023)')
fig, (ax1, ax2) = plt.subplots(2)
plt.subplots_adjust(hspace = 0.5)
arrow_args = dict(arrowstyle="->")
sns.lineplot(data = yearly_count_low_mag, x = "index", y = "time", ax = ax1)
ax1.margins(x=0)
ax1.set(xlabel = 'Year', ylabel = 'Count', title = 'Smaller Magnitude (3 to 5.5 mag) Earthquakes by Year (1980-2023)')
plt.scatter(earthquake_data['time'], earthquake_data['mag'])
ax2.annotate('1989 Loma Prieta Earthquake', xy=(pd.to_datetime('1989-10-18'), 6.90), xycoords='data',
xytext=(20,-5), textcoords='offset points',
arrowprops=arrow_args)
plt.xlim(pd.to_datetime('1974-01-01'), pd.to_datetime('2023-01-01'))
plt.ylim(0,8)
plt.xlabel('Year')
plt.ylabel('Magnitude')
plt.title('Major Earthquakes by year (1980-2023)')
plt.show()
For the geological region that I chose to focus on, based on the data I collected and filtered for, there has not been a major earthquake, of magnitude at least 5.5, in many years. After calculating the seismic gaps between earthquakes from 1800 to 2023, I graphed their distribution and calculated the current seismic gap. I was surprised by how large the current seismic gap is, at roughly 33.55 years, with the last earthquake over a magnitude 5.5 being the 1989 Loma Prieta Earthquake. Based on the histogram I plotted, the current seismic gap is larger than any of the seismic gaps since 1800. Most seismic gaps were between 0 and 6 years with some outliers, including seismic gaps of up to 30 years. To estimate the average estimated mean seismic gap for earthquakes in the region from 1800 to 2023, I used a 95% confidence interval using bootstrap resampling. My initial sample mean was 5.333 and my 95% confidence interval estimates the mean of the seismic gap to be between 2.833 and 8.164 years. The mean of the seismic gaps gives a benchmark of a typical period between earthquakes, which makes me believe that our current seismic gap is definitely unusual. In order to test the actual likelihood of the current gap based on the mean, I used a Z-score to calculate the probability of a seismic gap as extreme or more extreme than our current one to be observed based on a normal distribution with the mean and standard deviation of the data. I found a Z-score of 3.53, which translates to the probability of an observation as extreme or more extreme as our current seismic gap being about 0.0002 = 0.02%. Our current seismic gap is very unlikely to occur based on our data, so it does seem like the current gap in major seismic activity based on data from 1800 to present in the San Francisco Peninsula, South Bay, and East Bay, is unusual.
When looking into the relationship between the number of smaller earthquakes per year and whether a large earthquake had occurred, I had to first decide which magnitudes to count under smaller earthquakes. In order to not have too many data points, I chose earthquakes with magnitudes between 3 and 5.5 as smaller earthquakes. One thing I noticed is that since I wanted to compare the total number of smaller earthquakes by year, smaller earthquakes only started being recorded consistently from around 1974 to the present. Based on the graph of the number of smaller earthquakes per year compared to the major earthquakes from 1974 to the present, the major earthquakes line up very closely with the years when there were the most smaller earthquakes. For example, during 1989, the year of the major Loma Prieta earthquake, there was a very large spike in the number of smaller earthquakes that year, more than twice as many smaller magnitude earthquakes that year than any other year from 1974 to 2022. Considering that there were 4 years with major earthquakes from 1974 to 2022 and each of these years corresponded to one of the 4 highest numbers of smaller earthquakes, I am confident in stating that there is a strong correlation between whether or not there was a major earthquake in a year and how many smaller (3 to 5.5 mag) earthquakes there were that year.
One thing that I could investigate further is researching the history of the specific plates’ activity more. For example, the San Andreas Fault extends beyond the area that I studied, and it could be that there has been more activity in other parts of the fault that I did not include in my project. Also, I could look more into whether 5.5 is a good threshold for a major earthquake that actually impacts the community and could be dangerous. In addition, it may be unreliable to filter data in the way that I did, because I found one case where an earthquake of about 5.5 magnitude was filtered out, an earthquake in 2007 near East Foothills, CA, which is in the area of interest, that had a magnitude at the origin of 5.5 and a magnitude of Moment Tensor of 5.4. It would be helpful to understand why this and other earthquakes were filtered out and whether these edge-cases are worth including in the dataset. Finally, to expand further on the project, it would be interesting to research whether high numbers of smaller earthquakes cause large earthquakes or if major earthquakes cause increased seismic activity afterwards.
Acknowledgements/Works Cited:
James R. Holliday, Chien-chih Chen, Kristy F. Tiampo, John B. Rundle, Donald L. Turcotte, Andrea Donnellan; A RELM Earthquake Forecast Based on Pattern Informatics. Seismological Research Letters 2007;; 78 (1): 87–93. doi: https://doi.org/10.1785/gssrl.78.1.87
“Latest Earthquakes.” Earthquakes | U.S. Geological Survey, https://earthquake.usgs.gov/earthquakes/map/?extent=36.83347,-122.94525&extent=38.08701,-120.07233&range=search&timeZone=utc&search=%7B%22name%22:%22Search%20Results%22,%22params%22:%7B%22starttime%22:%221800-01-01%2000:00:00%22,%22endtime%22:%222023-04-29%2023:59:59%22,%22maxlatitude%22:38.2,%22minlatitude%22:37,%22maxlongitude%22:-121,%22minlongitude%22:-122.9,%22minmagnitude%22:5.5,%22orderby%22:%22time%22%7D%7D.
“Map of Known Active Geologic Faults in the San Francisco Bay Region.” Map of Known Active Geologic Faults in the San Francisco Bay Region | U.S. Geological Survey, www.usgs.gov/media/images/map-known-active-geologic-faults-san-francisco-bay-region.
Probabilities of Large Earthquakes in the San Francisco Bay Region, California. United States, Department of the Interior, U.S. Geological Survey, 1990. https://www.google.com/books/edition/Probabilities_of_Large_Earthquakes_in_th/aguKoa2cXgQC?hl=en&gbpv=0.
“Z Table and Z Score Calculation.” Z SCORE TABLE, www.z-table.com/.