ISTOCK,LUISMMOLINA metabolomics屏幕可以在给定的样本中检测到数千种不同的化合物,但与许多研究的假设相反,并不是每一种检测到的化合物都代表了一种独特的代谢产物,根据一项发表在分析化学上的研究。
圣路易斯华盛顿大学的代谢组学研究员Gary Patti和Nathaniel Mahieu报告说,通过液相色谱质谱(LC / MS)检测到的大肠杆菌中约25000种化合物,90%不是唯一的代谢物。相反,同样的代谢产物,碎片化或化学添加,被发现多次,一种被称为退化的现象。第二项分析,旨在除简并度之外清除污染物和人工制品,仅证实所观察到的化合物中有3%是真实的、独特的代谢物。
“这项研究证实了我认为在代谢组学世界里很多人都知道的东西,”密歇根大学内分泌学家查尔斯·伯兰特说,他没有参与这项研究。“我们在质谱学上看到的所有这些特征,真的,很多都是垃圾。”
科学家们使用代谢组学实验来描述小分子——在一组细胞中大量存在的少于2千斤,并比较健康和病变样本中存在的代谢物,希望能更好地理解紊乱。在使用LC / MS进行代谢分析时,研究人员首先从大分子中提取小分子,其中包括遗传物质和蛋白质。接下来,在液相色谱中,柱的物理上分离出萃取物的各种成分。然后,质谱仪对提取液中的每一种化合物进行测量,给它一个电荷,并记录它对磁场或电场的反应。
Patti说,因为识别质谱信号背后的化学物质是相当困难的,通常在系统生物学实验中,科学家们会比较不同样本或患者群体之间的LC / MS信号,而不确定潜在的化合物。“这种做法是非常危险的,”帕蒂说:“如果这是真的,至少在这些代谢组学实验,很多我们检测工件,污染物,和简并,那么你担心多少这些类型的比较是仿照数据对应的噪音。Patti说,希望大多数的研究能够进行验证分析,以发现这些错误,但首先要避免这些错误会更有效。
LC/ MS实验通常会显示出数千个“峰值”或特征,每个都代表一种化合物,其中许多研究人员无法识别并因此归类为“未知代谢物”。然而,正如作者所写,并不是所有这些无法识别的化合物都是新的代谢物。一种化合物也可能是无法辨认的,因为它是一种污染物,一种人工制品,或一种附加物:一种化合物与另一种带电分子结合。
这项研究证实了我所认为的代谢组学世界里很多人都知道的东西。所有这些我们在质谱学中看到的特征,实际上,很多都是垃圾。-查尔斯·布兰特,密歇根大学
“目前,代谢组学的一个重要组成部分将其特征等同于许多代谢物。”然而,这是完全错误的,Gary Patti和他的团队一直在努力解决这个问题,”密苏里大学代谢组学中心主任Lloyd Sumner在一封电子邮件中告诉《科学家》。Patti和研究生Mahieu决定从一份大肠杆菌样本中确定有多少LC / MS特征代表了实际的代谢物。
当研究人员通过LC / MS进行取样时,他们发现了大约25000个化合物,他们写道,这是一个典型的无目标代谢组学实验——研究人员认为所有的检测化合物,而不是寻找特定的化合物。但他们发现,许多特征是由于代谢产物通过与带电粒子结合形成的。
一些导管的形成,例如对氢离子的化合物的结合,是有意的,并且需要对质谱仪的化合物进行电荷。“令人惊讶的是,我们看到了许多其他类型的adducts,”Patti说。他们发现,通常情况下,分子会形成相互之间的相互作用。有时,化合物与污染物形成了adducts。“因为我们有很多同时存在的代谢物,我们发现很多东西都粘在一起——不仅仅是二聚体,还有三聚体,甚至更多,”Patti说。
在去除由adducts和片段所引起的额外信号后,潜在的独特代谢物的数量下降到大约3000,这意味着大约90%的原始质谱特征都是多余的。
研究人员使用了另一种方法来检测污染物——这些污染物并非来自于样本,而是来自于试管或溶剂和工件,或者是由于某种技术侥幸或数据处理故障而不是化合物的存在。这种方法被称为“认证”,包括用含有重碳同位素13C的葡萄糖在细菌样本中生长,并与常规的葡萄糖(主要是12c)生长,然后在分析前混合两个样本。
在质谱分析中,任何由细菌细胞产生的含碳代谢产物都应该产生两个特征:一个代表含有12c的化合物,另一个代表化合物的含量较重的13C。为了检测真正的代谢物,Patti解释说,“我们检查所有不同的信号,我们问信号是否有13C的舞伴?”因为污染物或人工制品来自非生物来源,它们不是由大肠杆菌制造的,他们没有舞伴。通过这种方法,他们发现了2,462个受信任的化合物。从名单上除去adducts后,892个真正的代谢产物仍然存在,大约是起始数量的3%。
Patti强调,这并不意味着大肠杆菌中只有892种代谢产物。他说,教科书会告诉你,大肠杆菌产生了超过900种。事实上,许多已知的代谢物在某种程度上是科学家们认为容易相信大量的假定代谢物出现的原因,Patti说。“人们说,‘有很多信号;教科书里有很多大肠杆菌代谢产物;他们可能松散联系。”
Patti说,目前的研究结果并不一定意味着没有成千上万的大肠杆菌代谢物。“这只是意味着我们不能用这种特殊的方法来检测它们。”
测定被检测化合物的化学特性是很费时间的。“如果你试着做2.5万件,只有1000件是真实的……”你最终会浪费很多时间和资源,”Patti说。为了拯救研究人员的麻烦,Patti和Mahieu已经创建了一个数据库(称为creDBle),从他们的e . colidataset开始。
净收益Mahieu G.J.帕蒂,“系统性注释的代谢组学数据集减少25000特性少于1000独特的代谢产物,“分析化学,doi:10.1021 / acs.analchem。7 b02380,2017。
原文:
原文:
Out of 25,000 features originally detected by metabolic profiling of E. coli, fewer than 1,000 represent unique metabolites, a study finds.
ISTOCK, LUISMMOLINA metabolomics screens can detect thousands of different compounds in a given sample, but contrary to the assumptions of numerous studies, not every detected compound represents a unique metabolite—far from it, according a to a study published in Analytical Chemistry.
metabolomics researchers Gary Patti and Nathaniel Mahieu of Washington University in St. Louis report that out of about 25,000 compounds detected in E. coli by liquid-chromatography mass-spectroscopy (LC/MS), 90 percent were not unique metabolites. Rather, the same metabolite, fragmented or with chemical additions, is spotted multiple times, a phenomenon known as degeneracy. A second analysis, designed to weed out contaminants and artifacts in addition to degeneracy, confirmed just three percent of the observed compounds are bona fide, unique metabolites.
“This study confirms what I think a lot of people in the metabolomics world have known,” says University of Michigan endocrinologist Charles Burant, who was not involved in the work. “All these features that we see during mass spectroscopy, really, a lot of them are sort of junk.”
Scientists use metabolomics experiments to profile the small molecules—less than two kilodaltons in mass—present in a given group of cells and to compare metabolites present in healthy and diseased samples in the hopes of better understanding a disorder. In metabolic profiling using LC/MS, researchers first extract the small molecules from the macromolecules, which includes genetic material and proteins. Next, in liquid chromatography, a column physically separates the extract’s various components. Then, a mass spectrometer weighs each compound in the extract by giving it an electrical charge and recording how it moves in response to magnetic or electric fields.
Because identifying the chemicals behind mass spectrometry signals is quite difficult, Patti says, often in systems biology experiments, scientists will compare LC/MS signals between different samples or patient groups without having identified the underlying compounds. “That approach is very dangerous,” Patti says, “because if it’s true that in at least some of these metabolomics experiments that a lot of what we’re detecting are artifacts, contaminants, and degeneracies, then you worry how much of those types of comparisons are being modeled on data that correspond to noise.” Hopefully, most studies perform validation analyses that would catch these mistakes, Patti says, but it would be more efficient to avoid them in the first place.
LC/MS experiments typically reveal thousands of “peaks,” or features, each representing a single compound, many of which researchers cannot identify and therefore categorize as “unknown metabolites.” However, as the authors write, not all of these unidentifiable compounds are novel metabolites. A compound might also be unidentifiable because it’s a contaminant, an artifact, or an adduct: one compound bound to a second, charged molecule.
This study confirms what I think a lot of people in the metabolomics world have known. All these features that we see during mass spectroscopy, really, a lot of them are sort of junk.—Charles Burant, University of Michigan
“Currently, there is a substantial component of the metabolomics communities that equates the number of features to a number of metabolites. However, this is totally erroneous, and Gary Patti and his group have [been] working to resolve this issue,”Lloyd Sumner, director of the University of Missouri metabolomics Center, tells The Scientist in an email. Patti and graduate student Mahieu set out to determine how many of the LC/MS features from a sample of E. coli represented actual metabolites.
When the researchers ran their samples through the LC/MS, they detected about 25,000 compounds, which, they write, is typical for an untargeted metabolomics experiment—where researchers consider all detected compounds, rather than looking for particular ones. But they found that many features were due to metabolites forming adducts by binding to a charged particle.
Some adduct formation, such as the binding of a compound of interest to a hydrogen ion, is intentional and necessary to charge compounds for mass spectroscopy. “What was surprising was that there are many other types of adducts we’re seeing,” Patti says. Often, molecules formed adducts with each other, they found. Sometimes, compounds formed adducts with contaminants. “Because we have a lot of metabolites that are present simultaneously, we’re finding that a lot of these things are sticking together—not just dimers, but trimers and even more,” says Patti.
After removing the extra signals caused by adducts and fragments, the number of potential unique metabolites was down to about 3,000, meaning that around 90 percent of the original mass spectroscopy features were redundant.
The researchers used another method to detect contaminants—compounds that did not originate from the sample but came, say, from a test tube or a solvent—and artifacts, or signals due not to the presence of a compound but to some kind of technical fluke or data-processing glitch. The approach, called credentialing, involves growing the bacterial samples with glucose containing the heavy carbon isotope 13C and, in parallel, growing them with regular glucose—which contains mostly 12C—then mixing the two samples before the analysis.
On the mass-spectrometry read-out, any carbon-containing metabolite produced by the bacterial cell should produce two features: one representing the 12C-containing compound, the other representing the compound with the heavier 13C. To detect bona fide metabolites, Patti explains, “we look through all of the different signals, and we ask if a signal has a 13C dance partner? And because contaminants or artifacts come from non-biological sources, they’re not being made by the E. coli, they don’t have a dance partner.” Through this approach they detected 2,462 credentialed compounds. After removing adducts from the list, 892 bona fide metabolites remained—roughly three percent of the starting number.
Patti stresses that this doesn’t mean there are only 892 metabolites in E. coli. Textbooks will tell you, he says, that E. coli produces many more than 900. In fact, so many known metabolites are in part why scientists found it easy to believe the high numbers of putative metabolites their screens turned up, Patti says. “People sort of said, ‘There’s a lot of signals; there’s a lot of E. coli metabolites in the textbooks; they’re probably loosely correlated.’”
The results of the present study don’t necessarily mean that there aren’t thousands of E. coli metabolites, Patti says. “It just means we can’t detect them with this particular assay.”
Determining the chemical identity of detected compounds is time-consuming. “If you tried to do this for 25,000 and only 1,000 of them were real . . . you’d end up wasting a lot of time and resources,” Patti says. To save researchers the trouble, Patti and Mahieu have created a database (called creDBle) of credentialed features, starting with their E. colidataset.
N.G. Mahieu, G.J. Patti, “Systems-level annotation of a metabolomics data set reduces 25000 features to fewer than 1000 unique metabolites,” Analytical Chemistry,doi:10.1021/acs.analchem.7b02380, 2017.