The purpose of this guide is to orient scientists and other data users to California’s commercial squid logbook data. These data are confidential and are only provided through authorized data sharing agreements. This guide seeks to help scientists and other data users anticipate how these data are formatted and how they may have to “clean” (a.k.a., process) the data before they can be used in analysis. I also seek to highlight special tricks and caveats gleaned from my experience using the data. Ultimately, I hope this guide will help clarify what data are available and how these data can be processed to maximize their utility.
Overview
The California Department of Fish and Wildlife (CDFW) has been compiling catch statistics on California’s fisheries since 1916. CDFW has required commercial squid vessels to submit logbooks since 2000. These logbook data are used to monitor fishing locations, environmental conditions, fishing effort, catch amounts, the use of catch, and fleet characterization and capacity. Additional information is available here.
An example logbook is shown below.
This guide provides an overview of California’s squid logbook data based on a data request processed in January 2024 for all squid logbooks from 2000-2022. In this guide, I review the attributes of the squid logbook data, the steps required to clean the raw data, and some visualizations of non-confidential summaries of the squid data.
Data attributes
The squid logbook data includes the following columns:
Column | Description |
LogSerialNumber | Logbook id |
VesselID | Vessel id (e.g., 12345) |
VesselName | Vessel name |
CaptainID | Captain id (e.g, L12345) |
CaptainName | Captain name |
VesselPermitNumber | Permit number (e.g., SVT123) |
Comments | Comments |
LogDateString | Date |
SetNumber | Set number |
StartTime | Start time |
EndTime | End time |
ElapsedTime | Elapsed time |
SetPosition | GPS position |
SetLatitude | Latitude (°N) |
SetLongitude | Longitude (°W) |
Temperature | Temperature (°F) |
BottomDepth | Bottom depth (fathoms) |
CatchEstimate | Catch estimate (short tons) |
LtdByMarketOrder | Limited by market order? (y/n) |
LightBrailSetUpon | Name of light boat set upon |
ByCatch | Bycatch (species – lbs) |
LandingReceipts | Landing receipts |
Attribute completeness
The figure below illustrates how consistently each attribute was filled out.
Attribute information and issues
Logbook id
The logbook id (“SerialNumber”) is a unique identifier for each logbook page. Note that a trip could potentially span multiple logbook pages.
Vessel id and name
The vessel id (“VesselID”) is a 5-number unique identifier assigned to each vessel by CDFW. The vessel name (“VesselName”) is the name of the vessel. The vessel id is nearly complete but the vessel name is missing for more vessels. These missing values could be filled through examination of another dataset or by querying this resource.
Captain id and name
The captain id (“CaptainID”) is a 6-digit unique identifier assigned to each captain by CDFW and is formatted as follows: “L12345”. The captain name (“CaptainName”) is consistently listed as the first initial followed by the last name: e.g., “J SMITH.” The captain name is missing more often than the captain id.
Vessel permit
The vessel permit number (“VesselPermitNumber”) is the permit id associated with the vessel and follows the following format: “SVT123”. It is often not provided.
- SVT = Squid Vessel Transferable
- SVN = Squid Vessel Non-Transferable
Comments
The comments (“Comments”) column describes anecdotal information such as additional bycatch information, equipment problems, interference from other boats, weather-related problems, day set activity, etc. It also includes CDFW staff comments during data entry.
Date
The date (“LogDateString”) column describes the date of fishing.
Set number
The set number (“SetNumber”) column describes the numerical order of sets.
Start time, end time, elapsed time
The start (“StartTime”) and end (“EndTime”) times are in 24-hour format. They are often 1, 2, or 3 digits and although I expect the missing 0s are on the left side they could be on the right side. The duration (“ElapsedTime”) column describes the duration of the fishing effort in minutes.I have yet to test the extent to which the reported duration matches a duration derived from the reported start and end times.
GPS position, latitude, longitude
The GPS position is reported in the format “XX°XX.XXX’ XXX°XX.XXX’”. A few missing longitude (“SetLongitude”) and latitude (“SetLatitude”) values can be derived from this data. The value “0” frequently occurs in the longitudes/latitudes and should be treated as unknown (“NA”).
The coordinates often fall on land and other unlikely places. They could be improved by looking at the coordinates provided by the light boats for the same fishing trip or by figuring out whether they fall within the block id reported in the landing receipts.
Depth and temperature
These columns indicate the bottom depth (“BottomDepth”) and surface temperature (“Temperature”) of the water where the braille was set.
Catch estimate and order limit
The catch estimate (“CatchEstimate”) column provides an estimate of the catch in short tons. A short ton is equivalent to 2000 pounds (lbs).
This column (“LtdByMarketOrder”) indicates whether the catch was limited by a market order and can be “yes” or “no”. It is often not provided.
Id of light boat set upon
This column (“LightBrailSetUpon”) provides a comma separated list of the vessel ids of the light boat the braille was set upon. It looks like the following: “12345, 23456, 34567”. If no light boat was used, this field will be empty.
Bycatch
This column (“Bycatch”) provides a comma separated list of bycatch species and the amount of bycatch using the following syntax: “SPECIES CODE – SPECIES NAME : pounds of bycatch”. An example entry is: “51 – Mackerel, Pacific : 152, 55 – Mackerel, Jack : 152, 100 – Sardine, Pacific : 50”. I have not parsed these data yet.
Landings receipts
This column (“LandingReceipts”) provides a comma separated list of the landing receipt ids associated with the catch. An example entry is: “W123456, W234567, W345678”.