The purpose of this guide is to orient scientists and other data users to California’s commercial sea cucumber trawl logbook data. These data are confidential and are only provided through authorized data sharing agreements. This guide seeks to help scientists and other data users anticipate how these data are formatted and how they may have to “clean” (a.k.a., process) the data before they can be used in analysis. I also seek to highlight special tricks and caveats gleaned from my experience using the data. Ultimately, I hope this guide will help clarify what data are available and how these data can be processed to maximize their utility.
Overview
The California Department of Fish and Wildlife (CDFW) has been compiling catch statistics on California’s fisheries since 1916. CDFW has required commercial sea cucumber trawl vessels to submit logbooks since 1982. These logbook data record start and end haul locations, time, depth, and duration of trawl tows, total catch by species market category, and gear used. Additional information is available here.
This guide provides an overview of California’s commercial sea cucumber trawl logbook data based on a data request processed in January 2024 for all cucumber trawl logbooks from 1982-2022. In this guide, I review the attributes of the squid logbook data, the steps required to clean the raw data, and some visualizations of non-confidential summaries of the squid data.
Data attributes
The sea cucumber trawl logbook data includes the following columns:
Column | Description |
LogSerialNum | Logbook id |
FisherNum | Fisher id (e.g., L12345) |
VesselNum | Vessel id (e.g., 12345) |
VesselName | Vessel name |
DepartureDate | Date of departure |
LandingDate | Date of landing |
PortCode | Port code |
PortDesc | Port name |
NetTypeDesc | Net type (single- or double-rigged) |
HeadropeLength | Head rope length (ft) |
OldYear | Year |
DetailDate | Date of tow |
DragNumber | Tow number |
BlockNumber | Block id |
SetTime | Time at start of tow (HH:MM:SS AP) |
SetDepth | Depth (fathoms) at start of tow |
UpTime | Time at end of tow (HH:MM:SS AP) |
UpDepth | Depth (fathoms) at end of tow |
TotalTime | Duration (minutes) |
SpeciesCode | Species id |
MarketCatDesc | Species name |
TotalPounds | Catch (lbs) |
DetailComments | Comments |
SetLatDeg | Latitude degrees at start of tow |
SetLatDec | Latitude minutes at start of tow |
SetLongDeg | Longitude degrees at start of tow |
SetLongDec | Longitude minutes at start of tow |
UpLatDeg | Latitude degrees at end of tow |
UpLatDec | Latitude minutes at end of tow |
UpLongDeg | Longitude degrees at end of tow |
UpLongDec | Longitude minutes at end of tow |
SetLoranCx | LORAN-C x value at start of tow |
SetLoranCy | LORAN-C y value at start of tow |
SetLoranCw | LORAN-C w value at start of tow |
UpLoranCx | LORAN-C x value at end of tow |
UpLoranCy | LORAN-C y value at end of tow |
UpLoranCw | LORAN-C w value at end of tow |
SetLoranAmin | LORAN-A min at start of tow |
SetLoranAmax | LORAN-A max at start of tow |
UpLoranAmin | LORAN-A min at end of tow |
UpLoranAmax | LORAN-A max at end of tow |
Attribute completeness
The figure below illustrates how consistently each attribute was filled out.
Data completeness through time
The following figure shows data completeness through time. The most challenging part is that the logbooks don’t report species and catch until 2010. I think this is a database error as opposed to the logbooks never having asked for this information pre-2010.
Number of logbooks over time
Attribute information and issues
Logbook id
The logbook id (“LogSerialNum”) is a unique identifier for each logbook page. Note that a trip could potentially span multiple logbook pages.
Vessel id and name
The vessel id (“VesselID”) is a 5-number unique identifier assigned to each vessel by CDFW. The vessel name (“VesselName”) is the name of the vessel. Missing values could be filled through examination of another dataset or by querying this resource.
Fisher id
The fisher id (“FisherID”) is a 6-digit unique identifier assigned to each captain by CDFW and is formatted as follows: “L12345”. In this dataset, many ids are missing the leading “L”.
Departure, landing, and tow dates
These columns describe the date of the tow (“DetailDate”) as well as the date of departure (“DepartureDate”) and landing (“LandingDate”). The date of departure and landing were not frequently reported in the early years of the logbooks.
Port code and name
The port of landing is identified using a 3-number code (“PortCode”). I provide a key to relate these codes to their names and complexes here. The name of the port is provided in the “PortDesc” column. This information is missing from a large number of entries.
Head rope length (ft)
The head rope length (“HeadropeLength”) is provided in feet. Head rope lengths <40 ft and >100 ft might not be correct.
The head rope is the line at the top of the trawl mouth (see diagram below).
Net type
The net type (“NetTypeDesc”) is either (a) single-rigged or (b) double-rigged. The diagram below shows several types of trawl rigs (McHugh et al. 2017).
Comments
The comments (“Comments”) column includes a lot of useful information that would ideally be parsed into the database. For example, it includes lots of catch information that is missing from the database. I have not attempted this.
Tow number
The tow number (“DragNumber”) column describes the order of the tows.
Start time, end time, and tow duration
The start (“StartTime”) and end (“EndTime”) times are in 12-hour format: “HH:MM:SS AM/PM” but I converted them to numeric times for analysis. The duration (“TotalTime”) column describes the duration of the fishing effort in minutes. The reported duration perfectly matches the duration derived from the reported start and end times.
I suspect some of the zero (mid-night) start and end times may be “unknowns” rather than true mid-night values based on the shape of the distribution. This may be diagnosable but I haven’t explored this yet.
Start and end depth
The depths at the start (“SetDepth”) and end (“UpDepth”) of the tow are reported in fathoms. Depths deeper than 250 fathoms might be typos.
Set and up location
The coordinates for the start and end of a tow are sometimes reported. They are reported mostly in GPS coordinates but some are reported in LORAN-C coordinates.
The GPS coordinates split the degrees and minutes portions of the coordinate and need to be merged. The longitudes also need to be multiplied by -1. Many of the coordinates are invalid or unrealistic. The unrealistic ones could be filtered by seeing if they fall within the reported block id.
I’ve written an R function to use this NOAA web tool to convert the LORAN-C coordinates to GPS coordinates. The R function is the ‘?loran_to_gps’ function in the wcfish R package.
Interestingly, many years have near 100% complete GPS coordinate data while other years are just missing these data entirely. It’s not totally clear to me why this is.
Species id and name
There are 53 species documented in the logbook catch.
The following five species are dominant: giant red sea cucumber, ridgeback prawn, unspecified sole, unspecified rock crab, and California lizardfish.
CDFW suggests that warty sea cucumber and unspecified sea cucumber are both likely to actually be giant red sea cucumber:
“Giant red sea cucumber (Apostichopus californicus), also known as California sea cucumber, is the primary target of the sea cucumber trawl fishery in California. Warty sea cucumber (Apostichopus parvimensis) is occasionally reported on trawl logs and is included in this data set. However, trawl data that includes warty sea cucumber is likely erroneous due to the shallower distribution of warty sea cucumber relative to where the trawl fishery operates. Unspecified sea cucumber that are reported on trawl logs are presumably mis-identified giant red sea cucumber as well.”
Also note that the species and catch is not recorded prior to 2010.
“The recording of a standardized species code and species name was not implemented until February 2010 as indicated in the data fields, “SpeciesCode” and “MarketCatDesc.” The species associated with trawl deployment prior to February 2010 reported in these data fields are completely missing; however, notes provided in the data fields “Comments” and “DetailComments” may be used as an alternative for some records to determine the target species. Although using these comment fields to determine species is challenging since there are many inconsistencies in how species are reported (i.e., cucumber, cuke, CQ, species code 755, etc.).”
Catch (lbs)
The catch (“TotalPounds”) is recorded in pounds (lbs). The condition of the catch – i.e., whether it is cut (eviscerated) or whole – is not recorded. Also, fishermen will sometimes report the total catch for the day rather than the catch for a specific tow. In these cases, CDFW evenly divides the catch among tows and notes this in the “Comments” field.
See above about how the catch is not reported before 2010.