IslandSkagitTHA_2019

2019 Modeling work for WA EMD, covering parts of Island and Skagit Counties, including La Conner, WA.

Big files

Some of the directories containing big files are not included in the git repository but can be downloaded from the tar file found at http://depts.washington.edu/ptha/IslandSkagitTHA_2019/ and then moved to the appopriate location. These include:

topo/topofiles
dtopo/dtopofiles
info/fgmax_regions
report/figs

Directory structure

topo: scripts and notebooks to manipulate DEMs and topo files
topo/topofiles: topofiles to use in job runs
dtopo: scripts and notebooks to manipulate dtopo files
dtopo/dtopofiles: dtopofiles to use in job runs
info: tide gauge locations, other general info
info/fgmax_regions contains Google Earth images for plotting purposes
new_code: Custom Python and Fortran code for this project
Runs: Directory containing input for all job runs
report: Directory containing latex files for final report.

Within Runs there are directories for each fgmax location, and within each of these there are subdirectories for each event: SFL, CSZ_L1, and AK.

Topography files:

The topo/topofiles directory contains topo files and also .kml and .png files that show the areas covered by each.

topo/merge_PS_PT_overlap.py was used to merge parts of Puget Sound (PS) and Port Townsend (PT) DEMs together with an overlap DEM provided by Kelly Carignan at NCEI into a single 1/3" DEM covering the study region:

topo/topofiles/PTPS_merged_13s.asc

This DEM is referred to as PTPSm-DEM in the report. A .nc version of this file (netCDF) is used by some of the Python scripts and Jupyter notebooks for plotting purposes. This can be created from the .asc file using the script topo/convert_PTPS_merged_asc2nc.py.

WA_EMD_2018/topo/coarsen_PS_PT_SJdF.py was previously used to coarsen the PS and PT DEM (and also the Strait of Juan de Fuca DEM) from 1/3" to 2" and those topo files were taken from 2018,

topo/topofiles/PT_2sec_center.asc
topo/topofiles/PS_2sec_center.asc
topo/topofiles/SJdF_2sec_center.asc

Outside of the Strait and Puget Sound, 1 arcsecond etopo1 topography was used. A single etopo1 topofile should have sufficed, but because the Alaska work was done later, two different files were used for different sources:

etopo1_-137_-122_38_54_1min_nc.asc
etopo1_-180_-66_-48_65_1min.tt3

But a new, smaller version, can instead be used for all runs:

etopo1_-163_-122_38_63_1min.asc

This file was obtained from the NCEI thredds server using the script topo/fetch_etopo1.py.

fgmax points

The study region was first split into a set of polygons defined as Ruled Rectangles.

fgmax points were selected to be all points in the corresponding polygon with DEM elevation lower than 15 m (including all water points in polygon),

The params.py file in each run directory has a pointer to such a file.

wet_masks

Masks are used to specify any points that must be forced to be dry initially even if the topo elevation is below MHW.

Setting up a run

The location directory

A directory for an fgmax location (study region) has a name like SW_Whidbey. This directory contains a subdirectory for each event, such as SFL, and also a subdirectory input_files with the files needed to run the code.

These files must be created by running the script input_files/make_input_files.py. This Python script was generated from the Jupyter notebook make_input_files.ipynb in the location directory, so alternativey you can run the notebook and then move the .data files generated to the input_files subdirectory.

Notes:

You may have to use python3
Requires the Python module netCDF4 and supporting libraries, to create netCDF a netCDF file that will have results later added to it. The conda environment defined by environment_geomin.yml should provide the minimum needed for running things.
This reads 1/3" topo from topo/topofiles/PTPS_merged_13s.nc. This is not included in the big_files but can be generated using the script topo/convert_PTPS_merged_asc2nc.py

The files generated include (e.g., for location SW_Whidbey):

fgmax_pts_topostyle_SW_Whidbey.data

The fgmax points as a topotype 3 file covering a rectangle, with 1/0 values indicating whether each point is an fgmax point or not.
RuledRegion_SW_Whidbey_ixy2.data

Description of a ruled rectangle to be used as an AMR flagregion at the finest level, that covers all fgmax points. Note: ixy2 means s=y in the ruled region, while ixy1 means s=x.
allow_wet_init_SW_Whidbey.data

Optional topotype 3 file with 1/0 entries. If there is dry land below MHW in this study region, this file indicates which points are allowed to be initialized to wet (1) and which are required to be dry initially (0).
SW_Whidbey_input.nc

A netCDF file containing the arrays indicating the fgmax_pts and allow_wet_init. After the code is run, this file is used as the basis for a netCDF file that also contains the model results.

These files are created by the Jupyter notebook make_input_files.ipynb, or by a Python script input_files/make_input_files.py created from this notebook (see below). Several figures (and png files) are also created. These are made in the main directory and then moved to input_files for safe keeping.

The event directory

Within an event directory, such as SW_Whidbey/SFL, there are the following files:

Makefile: Note that EXE might be set to $(WA_EMD_2019)/xgeoclaw so that a common executable can be used without recompiling in each event directory. Make sure you have this file and it is up to date with Fortran code in the new_code directory. make .exe should create it if necessary. Make sure $CLAW is pointing to clawpack-v5.6.1.
setrun.py: This generally does not need to change, unless you want to modify how often checkpoint files are created, or someothing else not set in params.py.
params.py: Most of the parameters that change from one location or event to another are collected in params.py. This is read in by setrun and also by setplot and by the process_fgmax notebook or script. If you change something in params.py, remake the data via make data before running the code again. Or use the make run to do make data followed by make output.

Important parameters to modify for a new loc/event

If copying these files to a new directory to create a different event at this location, the following parameters in params.py are particularly important to modify:

loc and event should be set for this location / event.
fgmax_extent is used in setplot for zoomed views.
restart: set to True to do a restart and set restart_file to point to the desired file to start from. Note that the files _output/fort.tck* give the times of each checkpoint file, while the data needed to restart is in the corresponding _output/fort.chk* file.
tfinal and num_output_times controls when the job ends (in seconds) and how many time frame files are produced along the way. If doing a restart, adjust these to the new (later) final time and desired total number of output times (including those time frames already generated).
lower and upper arrays specify the computational domain. Note that these are shifted left 1/6 arcsecond from values that should be chosen as multiples of 0.01 degree in order to insure that the finest computational grids at 1/3 arcsecond have cell centers aligned with the DEM grid points.
num_cells sets the number of grid cells at the coarsest level (Level 1) over the computational domain. Usually far field events (like Alaska) have a larger computational domain and coarser Level 1 grid than nearfield events.
amr_levels_max and refinement_ratios controls the maximum number of AMR levels and the refinement factor from each level to the next.
topofiles is a list of the topo files used for this run.
dtopofiles is a list containing a single entry for the dtopo file used for this run (the earthquake source). Should be consistent with how event is set and the name of the directory you are working in.
regions is a list of AMR flagging regions in the old style, i.e. each region is specified as a list of the form [minlevel,maxlevel,t1,t2,x1,x2,y1,y2].
flagregions is a list of AMR flagging regions in the new style, i.e. each is an object of class data_FlagRegions.FlagRegion. The module data_FlagRegions.py is in the new_code directory for now. Eventually this will be merged into amrclaw/data.py for more general usage. Note that flagregion.spatial_region_file points to a file that specifies a RuledRectangle that must be created to cover the spatial region desired. Files of this form covering the fgmax points are created in the notebook make_input_files.ipynb for this location, and should be used for refinement to the finest (1/3 arcsecond) level. Others are made in the notebook topo/make_flag_regions.ipynb that cover larger regions more efficiently than old-style rectangles would.
wave_tolerance: The tolerance used to determine whether to flag a cell for refinement, if it is not forced to be refined or restricted from further refinement by one of the regions or flagregions. A cell is flagged for refinement if abs(eta) > wave_tolerance, where eta = h + B is the surface elevation computed from water depth h and geoclaw topography B. This may need to be adjusted, e.g. smaller if the tsunami is small in the region of interest.
fgmax_files: A list of a single header file describing the fgmax parameters for this run. The header file is read in by the Fortran code and includes a path to the file that contains all the fgmax points. Note some changes from previous projects:
- The header file is created directly from params.py with parameters set here. Previously there was a separate script make_fgmax_header.py that had to be run.
- fg.xy_fname should be set to the absolute path of a file in the form of a topotype 3 topography file, with values 1/0 indicating whether a point on this rectangular grid is an fgmax point or not. This file is created by the make_input_files notebook or script in the location directory. In previous work we specified the fgmax points in a different format, a file with one line for each point, but specifying as a topo file mask works better in various ways. This style is specified by setting fg.point_style = 10, and fg.npts = 0.
Other fgmax parameters may need to be adjusted:
- fg.min_level_check should generally be set to the finest level. The values at fgmax points are only monitored on this level (or finer).
- fg.tstart_max: the time to start monitoring fgmax values, which should generally be a bit after the finest level is first created in the fgmax region (e.g. set to t_start + 20).
  Note that cells are flagged for refinement at the next regrid time after t_start, but this may not be exactly at time t_start.
- fg.dt_check: How often to update the running maxima stored at the fgmax points. Usually 20 (seconds) is ok. Note that 1/3 arcsecond grids are generally updated with a time step dt less than 1 second but we don't need to update the fgmax points this frequently (since it is time-consuming to do so), but you need to update them frequently enough that a maximum is not missed by much, so an appropriate value depends on the location of the fgmax region and the nature of the source.
gauges: A list of gauges to monitor, each in the form of a list [gaugeno, x, y, t1, t2]. Some gauges are read in from a csv file listing gauges specified by DNR, and those found to lie in the region specified by fgmax_extent are automatically included.
variable_sea_level: Set to True if the earthquake deformation affects the study location (i.e. for the SFL event in southern locations, but not for the CSZ or Alaska event). This controls how the finest grids are initialized, filling with water only up to sea level as adjusted by the dtopo file.
use_wet_mask: Set to True if there are some areas in this location that are below MHW but that should be initialized as dry because they are behind dikes or levies. If so, then you must also set fname_wet_mask to point to the file created by the make_input_files notebook or script.
t_stays_dry: The last time for which the wet_mask should be used when creating new AMR grids. After this time, it is ignored. Should be set late enough that the finest grids around the fgmax points have been created properly. At later times the tsunami may have arrived so this is not appropriate and would also add additional cost to running the code. Usually t_start + 120 seems fine.

Post processing

Some other parameters in params.py are only used in post-processing, see comments in this file.

After running GeoClaw, the results are in _output by default. This contains time frame results at the times listed in the fort.t* files, and also the fgmax results, in the files:

_output/fort.FG1.valuemax
_output/fort.FG1.aux1

These files list only the fgmax points, one per line.

The files can be post-processed using:

setplot.py: For making plots of the time frames after running the code via make plots (or interactively with Iplotclaw.
process_fgmax.ipynb: After running the code, this reads in the fgmax results (from the specified outdir) and produces plots of the maximum flow depth and speed. It also creates a netCDF file with all of the results as arrays on the topo grid used in creating the input netCDF file described above (copying the input file and adding the output). The resulting file has a name like SW_Whidbey_results.nc.
For this project, processing the results was automated using the Python script new_code/process_fgmax_common.py that works for all locations/events.

The .png files created by process_fgmax (if save_figs == True) will be copied into _plots/_other_figures/ so that they will be linked properly from _plots/_PlotIndex.html if you also run make plots to create the time frame plots and index in _plots. You can make plots either before or after process_fgmax is run.

After producing the .nc file and .png files of the figures you might want to move them (and also _plots) into a directory with a name like SW_Whidbey_SFL_results_DATE for safe keeping.

To turn a notebook into a python script

To turn a notebook (e.g. process_fgmax.ipynb) into a python script that can be used to process the data without opening the notebook, do the following:

Either open the notebook and export it as python using the "File --> Download as" menu,
or run this command at the command line:

jupyter nbconvert --to python process_fgmax.ipynb

Edit the resulting process_fgmax.py file to do the following:
Remove or comment out the lines of the form get_ipython().magic('...')
Set desired values for save_figs and save_fgmax_nc_file, or other parameters depending on the notebook.
Then you can run the script from the command line using

python process_fgmax.py

For make_input_files.py you can also do this:

`jupyter nbconvert --to python --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags="['hide-py']" make_input_files.ipynb`

This will suppress cells tagged hide-py and remove the cells that make plots, only producing the input files themselves. (Also removes the magic command.)

This will only work if jupyter nbconvert --version shows version >= 5.3.0.

Move the resulting make_input_files.py to input_files if you want to be able to run this script and make the files in place, useful when running on a remote cluster.

To turn a notebook into an html file

To turn a notebook (e.g. process_fgmax.ipynb) into an html file showing for archiving or sharing, do the following:

Either open the notebook (and execute it if you want the output cells and plots to appear) and export it as html using the "File --> Download as" menu,
or run this command at the command line:

jupyter nbconvert --to html --execute make_input_files.ipynb

Note that the --execute flag will cause it to execute the notebook before converting. If the output is already in the notebook or you want to create an html file of the input cells alone, leave this out.

If a cell times out, you can add the argument --ExecutePreprocessor.timeout=-1 for unlimited timeout.

To install and run on a new machine

See the detailed instructions in README_install.txt.

To run on CU cluster

Our production runs were done on the University of Colorado hpc cluster using time provided by CSDMS.

The main code repository was stored on the permanent home directory, but each Makefile also specifies a scratch directory where the output was directed.

Some notes on the workflow there:

Within the repository:

`git pull  # get recent changes`

In location directory:

if input_data does not yet exist submit job to create:

`sbatch ~/make_input_data.sbatch`

In event directory:

modify Makefile to set SCRATCH properly

submit job to run geoclaw:

`sbatch ~/run_geoclaw12.sbatch  # using 12 threads`

If doing a restart, use full path to scratch directory for checkpt file.

In scratch directory:

_plots should now exist with time frame plots. If not, sbatch ~/make_plots.sbatch will only make plots

Copy params.py to SCRATCH directory, and copy location's input_files directory to the parent of the SCRATCH directory.

To make fgmax plots in _plots/_other_figures, and results .nc file:

`sbatch ~/process.sbatch`

Also copy _output/timing.* to _plots/_timing_figures Copy .nc file to _plots if you want to download that too.

Copy fgmax and gauge output out of scratch directory for safe keeping:

`python $WA19P/new_code/copy_output.py`

On laptop:

Use cp_output_cu.sh to copy _plots directory down.

Results to display

Somewhat out of date since webpage was reorganized.

Use Runs/cp_plots.py to copy most results results from each fgmax region to results_plots. Before running this, make sure _plots in each directory has the most recent results (or is a symbolic link to the most recent results).

Update results_plots/index.html if necessary.

Run results_plots/collect_kmlfiles.py to collect all the kml files and required png files from each _plots/_other_figures directory into a single results_plots/kmlfiles directory. Tar and gzip this directory to make kmlfiles.tar.gz and then

`scp kmlfiles.tar.gz ptha@homer.u.washington.edu:public_html/WA_EMD_2019/results/`

Use Runs/all_gauges to collect final gauge results from all runs. Note that this includes all gauges included in the run, even if it is not to be used because a different run has better results for this gauge.

To collect desired gauge plots from appropriate region:

`python results_plots/Gauges/make_gauge_html.py`

This reads the region mapping from info/gauges_regions.csv and the most recent gauge png/csv/nc files from Runs/all_gauges and puts them all in results_plots/Gauges.
It also creates the index results_plots/Gauges/index.html.

`rsync -avz Gauges/ ptha@homer.u.washington.edu:public_html/WA_EMD_2019/results/results_plots/Gauges/`

Tar and gzip results_plots/Gauges to make Gauges.tar.gz and then

`scp Gauges.tar.gz ptha@homer.u.washington.edu:public_html/WA_EMD_2019/results/`