Cracking the Proteomic Code: Can Our Genes Also Tell Proteins How to Fold?

When scientists talk about the genetic code, we usually mean the famous process where DNA is “translated” into proteins, the molecular machines of life. But what if that’s only half the story?

A new study by NanoPrecMed member Andrew Miller and team introduces a bold idea: our genetic code might also contain instructions that help proteins fold into their correct 3D shapes and even determine how they interact with each other. They call this the “universal proteomic code.”

From Sequence to Shape: Why Folding Matters

We all learn in school that DNA carries the instructions for building proteins, the workhorses of the body. Each set of three DNA letters (a codon) translates into one of 20 amino acids. String enough codons together, and you code for a protein amino acid sequence. But proteins don’t work as floppy strings. They need to fold into precise 3D structures to become enzymes, antibodies, or muscle fibers. A misfolded protein cannot do its job and might even cause diseases, like in Alzheimer’s or Parkinson’s Disease.

Folding is complex and predicting how a sequence will fold has been one of biology’s biggest challenges, usually tackled with complex, time- and resource intensive folding simulations. In this respect, modern AI tools have made incredible progress in predicting how proteins fold. But they are still a bit of a black box – most (but not all) of the time, they show what happens, but certainly not why.

That’s where this new study comes in.

A Different Kind of Code

The study, led by Yazan Haddad and Andrew Miller at Mendel University in the Czech Republic, focused on a hypothesis from Miller dating back over 20 years which stated that there should be an amino acid interaction code (proteomic code or second genetic code), embedded in and/or deriving from the well-known genetic code,

Drawing on the background to this hypothesis, the genetic code was investigated leading to the application of one main biophysical principle and two genetic principles to define three distinct, new groups of amino acid pairing models (GU, Transmuted and Shift, respectively)¹ that were then found present at statistically highly meaningful levels in all main protein structures, and so of high biological relevance to all protein structure and function. For these reasons, the three pairing models were then designated as the basis of a “universal proteomic code” to account for protein structures and functions, including protein-protein interactions.

Hence, while the genetic code is well-known to determine amino acid sequences from DNA, the universal proteomic code should determine which amino acid building blocks prefer to interact or avoid each other through space. Like a hidden set of magnetic cues, this “second genetic code” should provide “contact maps” to guide proteins from open sequences into their correct shapes and hence functions.

A Universal Code?

So, is this really a second genetic code? The authors argue, “yes,” embedded in the redundancy and structure of the genetic code, there is a clearly in existence an additional layer of information that is logically capable to guide how proteins fold and interact.

Think of the traditional genetic code as the script for a play. The universal proteomic code is the stage directions, showing the actors where to stand, when to move, and how to deliver the performance.

If proven and adopted, the proteomic code could make possible knowledge-based protein 3D structure and function predictions to improve the design of new enzymes, vaccines, or synthetic proteins, help us understand how genetic mutations affect structure and disease and open the door to new types of gene-based therapies or diagnostics

It might even change how we teach molecular biology.

What Comes Next?

So, assuming that the proteomic code exists – and the evidence is mounting – it could change how we approach biology. It offers a rational, interpretable layer to understand protein folding and interactions. It could improve how we engineer new proteins for medicine, materials, or diagnostics. It helps explain why the genetic code is built the way it is – not just for redundancy, but for function. It challenges the notion that the genetic code is just a lookup table. It might be a map of shapes, not just sequences.

The authors call for this new code to be tested further,

What’s radical about this approach is that it doesn’t need big data or machine learning. Instead, the team uses logic, genetics and and biophysical principles to derive models that explain real protein behavior.

As Andrew Miller explains it: “It’s not anti-AI, it’s post-AI.”

While most of the world is racing to plug protein problems into neural networks, this method looks for underlying principles that explain why certain interactions happen. These principles can then guide knowledge-based computation, serving as a new kind of input or structural constraint.

But even at this early stage, the study hints at a deeper logic behind life’s architecture, one that ties together genetic code, structure and function in ways we’re only beginning to understand.

After all, nature rarely wastes space. If there’s a hidden layer of instruction in our DNA, it is probably there for a reason.

Foot note 1

The GU model is based on the fact that Gxx or xUx codons , which occupy two whole sides of the standard genetic code table, typically code for hydrophobic and small or negatively charged amino acids that are known generally to be capable of specific pairwise interactions with each other.

The Transmuted model arises from the idea that every codon-directed amino acid should be capable of specific interactions through space with corresponding complementary or antisense codon-directed amino acid, taking into consideration the possibility of mutations.

The Shift model is predicted based upon what happens when DNA is read slightly out of frame, like a typo that still makes sense.