The purpose of this guide is to orient scientists and other data users to California’s commercial gillnet logbook data. These data are confidential and are only provided through authorized data sharing agreements. This guide seeks to help scientists and other data users anticipate how these data are formatted and how they may have to “clean” (a.k.a., process) the data before they can be used in analysis. I also seek to highlight special tricks and caveats gleaned from my experience using the data. Ultimately, I hope this guide will help clarify what data are available and how these data can be processed to maximize their utility.
Overview
The California Department of Fish and Wildlife (CDFW) has been compiling catch statistics on California’s fisheries since 1916. CDFW has required commercial gillnet vessels to submit logbooks since 1982. Additional information is available here.
This guide provides an overview of California’s commercial gillnet logbook data based on a data request processed in January 2023 for all gillnet logbooks from 1982-2022. In this guide, I review the attributes of the gillnet logbook data, the steps required to clean the raw data, and some visualizations of non-confidential summaries of the gillnet data.
Note that the data I received do no include the skipper name, the landing receipt id, or the crew members names or license numbers, despite all of this occurring on the log.
Data attributes
The gillnet logbook data includes the following columns:
Column | Description |
SN | Logbook id |
VESSEL_NAME | Vessel name |
BOATNO | Coast Guard boat number |
VESSEL_ID | Vessel id (e.g., 12345) |
PERMIT | Permit id |
FISHING_DATE | Date of fishing |
Year | Year of fishing |
TARSPC | Target species – better version |
Final Target Species | Target species |
DRIFT_SET | Net type (drift, set) |
Final Net Type (Set, Drift) | Net type (drift, set) – better version |
FG_BLOCKS | Block id |
DEPTHS | Depths (fathoms) |
NET_LENGTH | Net length (fathoms) |
MESH_SIZE | Mesh size (in) |
BOUY_LINE_DEPTH | Buoy line depth (ft) |
HOURS_NET_SOAKED | Soak time (hrs) |
COMMON_NAME | Common name, format 1 (e.g., California halibut) |
FinalMLDS_Common_Name | Common name, format 2 (e.g., Halibut, California) |
MLDS_Species_Code | Species code |
STATUS | Status (kept, lost, released) |
NUM_CATCH | Number of fish caught |
WEIGHTS | Weight of fish caught (lbs) |
PREDATOR | Predators |
Attribute completeness
The figure below illustrates how consistently each attribute was filled out.
Number of logbooks by year
The figure below illustrates the number of logbook entries over time.
Attribute information and issues
Logbook id
The logbook id (“SerialNumber”) is a unique identifier for each logbook page. Note that a trip could potentially span multiple logbook pages or that a page could include multiple trips.
Vessel ids and name
The vessel id (“VesselID”) is a 5-number unique identifier assigned to each vessel by CDFW. The vessel name (“VesselName”) is the name of the vessel. The vessel name is missing for many vessels but the vessel id column is roughly complete. These missing values could be filled through examination of another dataset or by querying this resource.
Permit id
The permit id (“PERMIT”) is a 5-number unique identifier for the permit used for the fishing trip.
Block id
The block id (“FG_BLOCKS”) indicates the statistical reporting block where the majority of fishing occurred. This is the only spatial information recorded in the gillnet logbooks.
Net type
The net type, i.e., whether it is a set or drift gillnet, is reported in two columns: “DRIFT_SET” and “Final Net Type (Set, Drift)”. The second column is clean (all “drift” or “set”) but less complete. The first column is nearly complete but includes a number of invalid codes. I create a final determination using the second column preferentially over the first column.
The following codes could be correctly formatted:
- S/s = Set
- D/d = Drift
- 67 = Set (large-mesh set gillnet gear code)
- 68 = Set (small-mesh set gillnet gear code)
The following codes are unknown: 1, 2, 3, 5, H, N, Q, W, X
Depth and soak time
The depth (“DEPTHS”) is reported in fathoms and sometimes multiple depths are provided.
The soak times (“HOURS_NET_SOAKED”) are reported in hours and sometimes multiple soak times are provided.
There are unrealistic outliers in both values. Values of zero should be re-coded as NAs.
Mesh size, net length, buoy line depth
The mesh size (“MESH_SIZE”) is reported in inches and sometimes multiple values are provided.
The net length (“NET_LENGTH”) is reported in fathoms and sometimes multiple values are provided.
The buoy line depth (“BOUY_LINE_DEPTH”) is reported in feet and sometimes multiple values are provided.
There are unrealistic outliers in all three values. Values of zero should be re-coded as NAs.
Target species
The logbook asks fishers to specify the target species and to use the following official codes for common target species:
- B – Barracuda
- H – Halibut
- C – White croaker
- W – White seabags
- S – Shark/swordfish
- X – Soupfin shark
I assume the following assignments for two other codes:
- T = Thresher (I forget what led me to believe this)
- YELTL – Yellowtail
The following codes remain unmatched: 1, 4, 8, D, E, F, J, L, M, N, O, P, R, SW, Z
Species id and name
The species information is extremely poorly formatted.
The species information is spread across three columns:
- COMMON_NAME = Common name in format: “California halibut”
- FinalMLDS_Common_Name = Common name in format: “Halibut, California”
- MLDS_Species_Code = CDFW species code
However, there are exceptions within these columns on the formatting.
In general, I used information in the common name columns to fill in missing species codes and then attached nicely formatted common names based on the species codes. This requires heavy and careful formatting and I encourage users to check out my code.
I was unable to identify a species code for “harbor seal” or “unspecified sea urchin”. I was unable to identify a species for common names listed as: Sb, X, S, Grass Back, Verde, or Grass Bass. If anyone knows what species those are, please let me know.
Catch (number, weight, status)
The catch is reported as the number (“NUM_CATCH”) and pounds (“WEIGHTS”) of fish caught. The status (“STATUS”) of the catch – whether it was kept, released, or lost – is also listed. The following status codes are not understood: 1, 2, 3, 4, 5.
Predators
This column (“PREDATOR”) is supposed to list the number and species of fish lost to a type of predator. However, in practice, it appears to only list the type of predator present. It includes some erroneous entries such as: “used as bait”, “NMFS”, and “horne”.