CEFAS4e148f93-2982-4858-a418-8e7f9ded625d
English
dataset
Centre for Environment, Fisheries & Aquaculture Science
Data Manager
+44 (0)1502 562244
Cefas Lowestoft Laboratory
Pakefield Road
Lowestoft
Suffolk
NR33 0HT
UK
data.manager@cefas.co.uk
pointOfContact
2020-09-09T13:34:43
MEDIN Discovery Metadata Standard
Version 2.3.7
urn:ogc:def:crs:EPSG::4326
OGP
1969 - 2018 Centre for Environment, Fisheries & Aquaculture Science (Cefas) A new machine learning approach to seabed biotope classification
2020-09-09
publication
CEFAS4e148f93-2982-4858-a418-8e7f9ded625d
http://www.cefas.co.uk/
Files for use with the R script accompanying the paper Cooper (2020). Note
that this script also uses files from
`https://doi.org/10.14466/CefasDataHub.34`_ (details provided in script).
Cooper, K.M. (2020). A new machine learning approach to seabed biotope
classification. Science Advances.
.. _`https://doi.org/10.14466/cefasdatahub.34`:
https://doi.org/10.14466/CefasDataHub.34
Centre for Environment, Fisheries & Aquaculture Science
Data Manager
+44 (0)1502 562244
Cefas Lowestoft Laboratory
Pakefield Road
Lowestoft
Suffolk
NR33 0HT
UK
data.manager@cefas.co.uk
originator
Centre for Environment, Fisheries & Aquaculture Science
Data Manager
+44 (0)1502 562244
Cefas Lowestoft Laboratory
Pakefield Road
Lowestoft
Suffolk
NR33 0HT
UK
data.manager@cefas.co.uk
custodian
notPlanned
Delimited
Geographic Information System
NDGO0005
Habitat
Habitat characterisation
Habitat extent
SeaDataNet P021 parameter discovery vocabulary
2011-03-25
revision
Benthos
Biodiversity
Ecology
Invertebrate
Conservation
Management
Monitoring
Sea bed
GEMET, version 1.0
2008-06-01
publication
Habitats and biotopes
GEMET - INSPIRE themes, version 1.0
2008-06-01
publication
Public data (Crown Copyright) - Open Government Licence Terms and Conditions apply
otherRestrictions
Public data (Crown Copyright) - Open Government Licence Terms and Conditions apply
English
biota
SeaVoX Vertical Co-ordinate Coverages
2010-05-18
revision
Unknown
1.73881
1.74086
52.4581
52.4595
1969-03-30T23:00:00.000Z
2018-01-11T00:00:00.000Z
http://data.cefas.co.uk/#/View/19921/order
dataset
Files include: BiotopePredictionScript.R (R script), EUROPE.shp (European
Coastline), EuropeLiteScoWal.shp (European Coastline with UK boundaries),
DEFRADEMKC8.shp (Seabed bathymetry), C5922DATASETFAM13022017.csv (Training
dataset), PARTC16112018.csv (Test dataset), PARTCAGG16112018.csv (Aggregation
data). Description of C5922DATASETFAM13022017.csv: This file is based on the
RSMP dataset (see
https://www.cefas.co.uk/cefas-data-hub/dois/rsmp-baseline-dataset/), but with
macrofaunal data output at the level of family or above. A variety of gear
types have been used for sample collection including grabs (0.1m2 Hamon, 0.2m2
Hamon, 0.1m2 Day, 0.1m2 Van Veen and 0.1m2 Smith McIntrye) and cores. Of these
various devices, 93% of samples were acquired using either a 0.1m2 Hamon grab
or a 0.1m2 Day grab. Sieve sizes used in sample processing include 1mm and
0.5mm, reflecting the conventional preference for 1mm offshore and 0.5mm
inshore. Of the samples collected using either a 0.1m2 Hamon grab or a 0.1m2
Day grab, 88% were processed using a 1mm sieve. Taxon names were standardised
according to the WoRMS (World Register of Marine Species) list using the Taxon
Match Tool (http://www.marinespecies.org/aphia.php?p=match). Of the initial
13,449 taxon names, only 774 remained after correction and aggregation to
family level. The final dataset comprises of a single sheet comma-separated
values (.csv) file. Colonials accounted for less than 20% of the total number
of taxa and, where present, were given a value of 1 in the dataset. This
component of the fauna was missing from 325 out of the 777 surveys, reflecting
either a true absence, or simply that colonial taxa were ignored by the
analyst. Sediment particle size data were provided as percentage weight by
sieve mesh size, with the dataset including 99 different sieve sizes. Sediment
samples have been processed using sieve, and a combination of sieve and laser
diffraction techniques. Key metadata fields include: Sample coordinates
(Latitude & Longitude), Survey Name, Gear, Date, Grab Sample Volume (litres)
and Water Depth (m). A number of additional explanatory variables are also
provided (salinity, temperature, chlorophyll a, Suspended particulate matter,
Water depth, Wave Orbital Velocity, Average Current, Bed Stress). In total,
the dataset dimensions are 33,198 rows (samples) x 900 columns
(variables/factors), yielding a matrix of 29,878,200 individual data values.