We asked Dr Sebastian Bartsch, Head of Bioinformatics at c-LEcta, to explain how artificial intelligence is specifically used to support enzyme engineering. Together with his colleagues, he develops the bioinformatics software ASSET-DB (Analysis SyStem and Engineering Tool & DataBase), which is part of c-LEcta’s proprietary technology platform ENESYZ. He explains to us how exactly artificial intelligence supports enzyme engineering.
At the beginning of every enzyme development project, the question is what exactly the enzyme should do and what properties it needs to have for this purpose. Already at this first step, c-LEcta’s ENESYZ technology platform starts with our MDM analysis. MDM stands for "Multi-Dimensional Mutagenesis". It analyzes numerous properties of the enzymes in question and uses various bioinformatics methods to calculate which variant of an enzyme should be tested for improved properties. Based on this analysis the work in the wet lab begins. The platform supports the scientists in creating, conducting and evaluating the experiments and reduces the amount of laboratory work to be carried out.
To ensure the identification of appropriate enzymes from the outset, we leverage molecular dynamics simulations, among other techniques, to forecast properties that remain undiscovered. This facilitates predictions regarding the binding of a specific substrate or identification of the precise location for amino acid substitutions to enable binding.
Certain experience- and knowledge-based selection procedures for variants are already implemented in the MDM analysis. However, artificial intelligence requires a large number of measurements and sequence information to learn from them. The methods used for this purpose are therefore also called machine learning (ML). We experimentally collect this data in the laboratory and it flows back automatically into the ASSET-DB bioinformatics platform database. This platform brings together all data and analyses under one roof and helps the scientists to analyze and compare individual variants. The data obtained in this process offers the opportunity to gain insight into how and which mutations have a specific influence on the respective enzyme. This is where machine learning comes in, as the relationships between the sequence and function of an enzyme are very complex. We are currently using and optimizing techniques that can understand these complex relationships and apply them to the prediction of improved enzymes. An example of the use of machine learning is the task to predict how different variants of an enzyme can be recombined. When multiple mutations, each of which improves enzyme properties individually, are combined randomly, they often have a detrimental effect on the enzyme. The mutations often work against each other instead of with each other. With the help of the information from our database, additional bioinformatics analysis, and machine learning, we can predict the interactions of the different mutations. The most advantageous combination can thus be predicted making it no longer necessary to screen millions of combinations at great effort and expense in the laboratory.
However, we have also made the experience that, given the current state of knowledge, one should not rely exclusively on AI and machine learning. There are currently still enough projects where not enough high-quality data can be acquired, or the ML models are not yet sufficiently precise. This is when classic bioinformatics approaches and the scientific intuition of experienced staff are needed.
At c-LEcta, we follow a knowledge-based approach in enzyme engineering, which combines databases, informatics tools as well as machine learning with data from the wet lab and accurate analytical methods. This contrasts with (ultra) high-throughput screening, where millions of variants are automatically screened by highly automated machines in a short period of time, which requires a lot of machinery and technical effort. In addition, this data is often of lower quality, as trade-offs between process relevant reaction conditions and throughput are usually decided in favor of throughput and automation compatibility. From our point of view, however, it is important to analyze the variants under conditions that are as similar as possible to the industrial process the enzyme will be used in and to accurately determine multiple properties of the enzyme variants in parallel. Therefore, we primarily focus on high quality data rather than quantity, i.e., rather screen fewer variants, but generate data with a high informative value. By using a very efficient, structured, knowledge-based approach, we focus on a smaller number of variants (usually <1000). As the generation of accurate lab data is the most time-consuming part, this means that the information obtained in the lab can be reused much faster for a further engineering iteration. Also, we usually need only very few iterations until the enzymes reach the desired properties. This significantly increases our development speed and reduces development costs.
Dr Sebastian Bartsch is a biochemist and has been with c-LEcta for more than 10 years.
After starting his career as a project manager, he has increasingly focused on the topics of structural analysis of enzymes and bioinformatics.
In addition to laboratory staff, a bioinformatician joined his team in 2015 and a software developer in 2020. Since January 2022, he has been heading the bioinformatics department at c-LEcta. Mr. Bartsch sees himself as a mediator between the worlds of applied biochemistry and theoretical computer science. Thus, he helps interpret data from the lab and builds tools with his collaborators to support the field of enzyme development.