Project Highlights

  • Desired output: Assisted interpretation of large depth-related biostratigraphic datasets in terms of (i) Biozones, which can be related to age; (ii) Palaeoenvironment – expressed as water depth range, with implication for sequence stratigraphic interpretation; identification of reworking/caving
  • Test various machine learning and data analytic approaches to determine the optimal approach for analysing a true big data challenge
  • Industry-led topic that integrates expertise across academic disciplines (Earth Sciences, Computing)



Quantitative or semi-quantitative biostratigraphic data (microfossil occurrences in well and outcrop samples) are routinely collected within the hydrocarbon industry, but the interpretation of the data in terms of age/biozone and palaeoenvironment (which then feeds into other aspects of the exploration process) can be laborious, requiring access to specialist knowledge. Many companies no longer have sufficient specialist knowledge in-house and historical biostratigraphic data has become an underutilized resource. A great deal of value could be released from this data if the process of interpretation was automated, allowing geoscientists to rapidly synthesise the results into depositional environment maps, for example. The digital form of most biostratigraphic data makes it suitable for machine learning techniques, using training datasets where the interpretations are already verified.

We propose to use data sets from BGS (British Geological Survey) and IODP (International Ocean Drilling Program). Additionally, discussions are ongoing for the release of suitable training data from major oil companies. The aim of the project would be to enable the characterization of the microfossil assemblage in a sample in terms of palaeoenvironment and age/biozone with confidence and range of uncertainty in the interpretation expressed. Further implications for sequence stratigraphic interpretations and the identification of reworking and caving is desirable (i.e. automation of cleaning of the data pre-processing).

It is worth noting that this type of data has been captured by operators for many decades. As a consequence, there is a vast back catalogue of unstructured data that could be processed, cleaned and interrogated with the techniques to lead to new insights.


Figure 1. Overview of the project design


Familiarisation with traditional best practice approaches to biostratigraphic interpretation and issues regarding data standardisation, removal of spurious data/artefacts. Understanding the relative importance of some data types.

Familiarisation with the appropriate Machine Learning techniques

Building, cleaning and testing of a range of appropriate training datasets

Testing of a range of machine learning approaches/algorithms

Development of a robust workflow

Training and Skills

CENTA students are required to complete 45 days training throughout their PhD including a 10-day placement. In the first year, students will be trained as a single cohort on environmental science, research methods and core skills. Throughout the PhD, training will progress from core skills sets to master classes specific to the student's projects and themes.

The successful applicant will benefit from a placement at the Halliburton offices in Abingdon, Oxfordshire, where relevant training by subject matter experts will be delivered. The student will gain broad experience in the application of biostratigraphic analysis to hydrocarbon exploration and production. They will develop detailed knowledge of specific taxonomic groups and how these can be used to determine geological age and depositional environments. They will benefit from in-depth training in sequence stratigraphy and wireline log interpretation. The student will also develop skills in data analytics, and machine learning techniques, using Java, C++, Python or the R environment. These skills are highly transferable, enabling the student to progress in any STEM discipline, and are in high demand in industry.


Year 1: Familiarisation and research into Machine Learning techniques and preparation/build of training datasets.

Year 2: Main phase of applying Machine Learning to the problem, leading to draft paper.

Year 3: Test on a wide variety of datasets (age, paleoenvironment, preservation, fossil groups) to determine the best methodology.

Year 4: Determination of optimal workflow, further paper. PhD thesis completion.

Partners and collaboration (including CASE)

The project is a collaboration between academic and industrial partners. The project is hosted by the University of Birmingham and led externally by Halliburton. Many key datasets will be provided by the British Geological Survey who are also partners on this project.

Further Details

See CENTA web page for information on how to apply and general information (http://www.birmingham.ac.uk/generic/centa). Contact supervisors for specific information on this project Dr Ian Boomer (i.boomer@bham.ac.uk), Dr Mike Simmons (mike.simmons@halliburton.com), Helen Smyth (helen.smyth@halliburton.com)