A.ÈçºÎ»ñµÃµ°°×Öʶ¯Ì¬¹ý³ÌµÄ½á¹¹ÐÅÏ¢£¬ÒÔ½¨Á¢µ°°×ÖÊ·Ö×Ó¾«×¼µÄ¹¹Ð§¹ØÏµÊÇÒ»¸öÖØ´óµÄÌôÕ½£º
µ°°×ÖÊ·Ö×ÓÊÇÉúÃüµÄ»ùʯ¡£ÉúÎïϵͳµÄ¸÷ÖÖ¹¦ÄÜ£¬ÒÀÀµÓÚ¸÷ÖÖµ°°×ÖÊ·Ö×Ó²»Í¬ÐÎʽºÍ³Ì¶ÈµÄ±í´ï¡£ÈÏʶµ°°×ÖʵŦÄÜ£¬½¨Á¢ÔÚ¶ÔÆä½á¹¹¾«×¼ÈÏʶµÄ»ù´¡ÉÏ¡£ÌرðÊÇ£¬µ°°×ÖÊ·Ö×Ó¶¯Ì¬¹ý³ÌÖнṹµÄʵʱ±ä»¯£¬¶Ô½Òʾ¾ßÌå»·¾³Öеĵ°°×ÖÊÐÔÖÊ¡¢·¢Õ¹ÏÖ´úÉúÃü¿ÆÑ§ºÍҽѧҩÎïÑо¿¾ßÓÐÖÁ¹ØÖØÒªµÄÒâÒå¡£Òò´Ë£¬¡¶SCIENCE¡·ÆÚ¿¯Ìá³öµÄÏÖ´ú¿ÆÑ§125¸öÇ°ÑØÖØÒªÎÊÌâÖоÍÓÐÈý¸öÖØ´ó¿ÆÑ§ÎÊÌâÓë²â¶¨µ°°×ÖʽṹÏà¹Ø¡£ÈçºÎ»ñµÃµ°°×Öʶ¯Ì¬¹ý³ÌµÄ½á¹¹ÐÅÏ¢£¬ÒÔ½¨Á¢µ°°×ÖÊ·Ö×Ó¾«×¼µÄ¹¹Ð§¹ØÏµ£¬Êǵ°°×ÖʽṹÑо¿µÄºËÐÄÎÊÌâ¡£
B.·¢Õ¹¿ìËÙÏìÓ¦²¢ÊµÊ±Ì½²âµ°°×Öʶ¯Ì¬½á¹¹µÄ·Ö×Ó¹âÆ×¼¼ÊõÊÇÒ»¸öÖØÒªµÄ¿ÆÑ§ÎÊÌ⣺
·Ö×Ó¹âÆ×¼¼Êõ£¬¿ÉÒÔͨ¹ý²âÁ¿µ°°×ÖʶԹâÕÕµÄÏìÓ¦Ðźţ¬ÀûÓò»Í¬µ°°×ÖʵĹâÑ§ÌØÕ÷²»Í¬À´²â¶¨µ°°×Öʽṹ¡£±ÈÈ磬XÉäÏß¹âÆ×£¬ºìÍâÎüÊÕ¹âÆ×£¬ÀÂü¹âÆ×£¬Ô²¶þÉ«¹âÆ×£¬¶¼¿ÉÒÔÖ¸Èϳöµ°°×Öʵġ°¹âÑ§Ö¸ÎÆ¡±¡£ÌرðÊǺìÍâ¹âÆ×£¬ÒòÆä¶Ô¶þ¼¶½á¹¹±ä»¯µÄ¸ßÃô¸Ð¶È£¬³ÉΪ²â¶¨µ°°×ÖʽṹµÄÒ»ÖÖÖØÒªÊÖ¶Î (Nature 2020, 577, 52-59£»Science 2016, 353,1040-1044£»Chem. Rev. 2017,117, 10623-10664)¡£
ÀûÓúìÍâ¹âÆ×¡°¹âÑ§Ö¸ÎÆ¡±ÐÅÏ¢Ô¤²â½á¹¹ÐÅÏ¢£¬Àë²»¿ªÀíÂÛÄ£ÄâµÄ¶Ô±ÈºÍÈ·ÈÏ¡£È»¶ø£¬µ°°×ÖÊ·Ö×Ó¹âÆ×µÄÀíÂÛÄ£ÄâÃæÁÙÑÏÖØµÄ¼ÆËãÆ¿¾±ÎÊÌâ¡£µ°°×ÖÊÔÚÈÜÒºÖеĽṹÊÇÈÜÖÊ·Ö×ÓÓëÖÜΧ»·¾³Ï໥×÷ÓõÄÕûÌåЧ¹ûµÄ·´Ó³£¬ÓÉÓÚÔ×ÓÊý¶à£¬×ÔÓɶȼ«´ó£¬¸ø¾«È·µÄ¹âÆ×¼ÆËã´øÀ´ÁËÏ൱´óµÄÌôÕ½£¬Òò´ËÏÞÖÆÁËʵÑé¹âÆ×µÄ½â¶ÁºÍÔλÆ×ѧ̽²â¼¼ÊõµÄ·¢Õ¹¡£Òò´Ë£¬¹âÆ×Ä£ÄâÈçºÎ°ïÖúʵÏÖ¿ìËÙÏìÓ¦ºÍʵʱ̽²âµ°°×Öʶ¯Ì¬½á¹¹µÄ·Ö×Ó¹âÆ×¼¼Êõ£¬ÊÇÒ»¸öÖØÒªµÄ¿ÆÑ§ÎÊÌâ¡£
A.µ°°×ÖÊÄ£Ð͹þÃܶÙÁ¿µÄ¹¹½¨£º
µ°°×ÖÊÔÚºìÍâ¹âÆ×ÖÐÓкܶàÌØÕ÷ÎüÊÕ´ø,ÆäÖÐõ£°· I ´ø(1600-1700cm-1),°üº¬Á˵°°×ÖʷḻµÄ¶þ¼¶½á¹¹ÐÅÏ¢£¬Èç¦Á-ÂÝÐý¡¢¦Â-ÕÛµþ¡¢¦Â-ת½Ç¡¢¾íÇúµÈ£¬ Òò´Ë²âÁ¿õ£°· I ´øºìÍâ¹âÆ×¿É»ñµÃµ°°×Öʵġ°¹âÑ§Ö¸ÎÆ¡±ÐÅÏ¢¡£È»¶øµ°°×ÖÊ·Ö×ÓÖеÄÔ×ÓÊý³É°ÙÉÏǧ£¬½á¹¹×ÔÓɶÈÏ൱´ó£¬Èç¹ûÓÃÕû¸ö·Ö×ӵĽṹÐÅϢȥԤ²âµ¥Ò»µÄÆ×ѧÐźţ¬±äÁ¿Ì«¶àÇÒ²»¿É¿Ø£¬¹¹½¨»úÆ÷ѧϰģÐÍÏ൱ÄѶøÇÒЧ¹û²»ºÃ¡£¿ÉÐеĻúÆ÷ѧϰģÐÍ£¬±ØÐ뽨Á¢ÔÚ¶Ôµ°°×ÖÊ·Ö×ӽṹµÄºÏÀí»®·ÖºÍ¶ÔƬ¶ÎµÄÐÔÖʽøÐÐѧϰºÍÔ¤²âµÄ»ù´¡ÉÏ£¬¼´·Ö¶øÖÎÖ®µÄ²ßÂÔ¡£
ÒÔõ£°·IºìÍâÆ×ΪÀý£¬ÎÒÃǹ¹½¨ÁËÕâÑùÒ»¸ö¹þÃܶپØÕó£¬Èçͼ1Ëùʾ¡£ÔÚÉÏÊöÃèÊöõ£°·IÕñ¶¯µÄ¹þÃܶپØÕóÖУ¬¶Ô½ÇԪΪÿ¸öëļüµÄÕñ¶¯ÆµÂÊ(¦Øi)£¬ÓÉN-¼×»ùÒÒõ£°··Ö×Ó(NMA)µÄÉñ¾ÍøÂçÄ£ÐÍÔ¤²âµÃµ½£¬Á½²àµÄ·Ç¶Ô½ÇԪΪÏàÁÚÁ½¸öëļüµÄÕñ¶¯ñîºÏϵÊý(Jij)£¬Óɸʰ±Ëá¶þëÄ(GLDP)·Ö×ÓµÄÉñ¾ÍøÂçÄ£ÐÍÔ¤²âµÃµ½£¬ÆäËû·Ç¶Ô½ÇÔªÔªËØÎª·ÇÏàÁÚëÄ
B.»úÆ÷ѧϰÊý¾Ý²úÉúÒÔ¼°ÃèÊö·ûµÄѡȡ£º
ΪÁ˳ä·Ö²ÉÑù½á¹¹²îÒì´óµÄNMA·Ö×Ó£¬ÎÒÃÇʹÓò»Í¬µÄ³õʼ¹¹ÏóÔËÐÐÁËÆß×éÒ»¹²241.5 psʱ³¤µÄ´ÓÍ·Ëã·Ö×Ó¶¯Á¦Ñ§Ä£Äâ (ab initio molecular dynamics, AIMD)£¬Ã¿¸ô50²½½øÐвÉÑùÒÔ±ÜÃâ½á¹¹Ö®¼äµÄÏà¹ØÐÔ¡£ÎªÁ˳ä·Ö¿¼ÂÇÈܼÁЧӦ¶ÔëļüÕñ¶¯ÆµÂʵÄÓ°Ï죬ÎÒÃÇÌáÈ¡Á˶¯Á¦Ñ§¹ì¼£ÖÐNMA·Ö×ÓÒÔ¼°ÆäÖÜΧ5 ÅÒÔÄÚµÄË®·Ö×ÓÒ»¹²9660×éÊý¾Ý½øÐÐÁ¿»¯¼ÆËã¡£¶ÔÓÚ¶þëÄGLDP·Ö×Ó£¬ÎªÁ˽ÚÊ¡¼ÆËã×ÊÔ´£¬ÎÒÃDzÉÓöÔRamachandran½Ç (-180¡ã¡Üϕ¡Ü180¡ã, -180¡ã¡Ü¦×¡Ü180¡ã)ÿ¸ô5¡ã½øÐÐϵͳʽɨÃ跽ʽ²úÉú³õʼ¹¹Ïó£¬Ò»¹²²úÉúÁË5128×é¶þëÄÊý¾ÝÓÃÓÚÁ¿»¯¼ÆËã²úÉú½üÁÚÕñ¶¯ñîºÏÊý¾Ý¡£
ÎÒÃÇʹÓÃNMAÒÔ¼°GLDP·Ö×ӵĿâÂ×¾ØÕó×÷Ϊ»úÆ÷ѧϰѵÁ·µÄÃèÊö·û£¬²ÉÓÃÉî¶ÈÉñ¾ÍøÂçÄ£ÐͶÔ(¦Øi,, Jij)½øÐÐѧϰºÍÔ¤²â¡£ËùÓÐÄ£Äâ¾ùÔÚTensorFlow³ÌÐòÉϽøÐС£ÓÉÓÚ¿âÂ×¾ØÕóÃèÊö·û×ÔÉíµÄÐýת²»±äÐÔ£¬ÎªÁËÏû³ý () ÔÚѵÁ·¹ý³ÌÖеķ½ÏòÒÀÀµÐÔ£¬ÎÒÃǶÔÿ¸öNMA·Ö×Ó½øÐÐÁËÐýת¾ØÕó²Ù×÷£¬½«ôÊ»ùCÔ×ÓÉèÖÃΪxyz×ø±êϵÖеÄÁãµã£¬½«C-O¼üתÖÁyÖáÕý·½Ïò£¬½«¡ÏOCNÖÃÓÚx-yÆ½Ãæ£¬È»ºó£¬¶ÔÓÚеÄNMA·Ö×ÓµÄ()µÄÔ¤²âͨ¹ý³ËÒÔÐýת¾ØÕóµÄÄæ¾ØÕóÀ´»ñµÃÆäÔÀ´×ø±êϵϵÄÖµ¡£
A.»úÆ÷ѧϰģÐÍÆÀ¹À£º
ÎÒÃDzÉÓý»²æ¼ìÑéµÄ·½Ê½À´ºâÁ¿Éñ¾ÍøÂçÄ£Ð͵ÄÔ¤²âЧ¹û£¬´Óͼ2ÖпÉÒÔÖ±¹ÛµÄ¿´µ½£¬Éñ¾ÍøÂç¶ÔÓÚÆµÂÊÒÔ¼°Õñ¶¯ñîºÏ³£ÊýµÄÔ¤²âЧ¹ûºÜºÃ£¬ÕâÊÇÒòΪËüÃÇÖ÷Ҫȡ¾öÓÚ»ù̬½á¹¹¡£µ«ÊÇ£¬ÓÉÓÚԾǨ̬(ÀýÈ磬Õñ¶¯Ô¾Ç¨Å¼¼«¾Ø)Éæ¼°Á½¸ö²»Í¬µÄÕñ¶¯Ì¬£¬Ëü¶Ô½á¹¹µÄ±ä»¯¸ü¼ÓÃô¸Ð£¬¶øÎÒÃǵÄÃèÊö·ûÖ»°üº¬»ù̬ÐÅÏ¢£¬Òò´Ë»á¿´µ½¸ü¶àµÄÒì³£Öµ¡£×ÜÌåÀ´Ëµ£¬ÎÒÃǶÔ(¦Øi,, Jij)µÄÔ¤²âÓÐ×ŏߵᤶûѷϵÊý(r>0.9)ÒÔ¼°¼«µÍµÄÎó²î±£Ö¤ÁËÎÒÃǺóÐø¹¹½¨Õñ¶¯¼¤×Ó¹þÃܶÙÁ¿µÄ׼ȷÐÔ¡£
ʹÓûúÆ÷ѧϰԤ²âµ°°×ÖʺìÍâ¹âÆ×µÄÕû¸öÁ÷³ÌÈçͼ3Ëùʾ£¬Ê×ÏÈÎÒÃǽ«µ°°×Öʲð·ÖΪµ¥¶ÀµÄëļüºÍ¶þëÄ£¬ÓÉNMA·Ö×ÓNNÄ£ÐÍÔ¤²âµÄ(¦Øi)ÒÔ¼°()½«ÓÃÓÚÉú³É¹þÃܶÙÁ¿µÄ¶Ô½ÇÔªÔªËØºÍÓÉ·ÇÏàÁÚëļüÖ®¼äµÄñîºÏ²úÉúµÄ·Ç¶Ô½ÇÔªËØ(ͨ¹ýż¼«½üËÆ¼ÆËã)¡£´ÓGLDP·Ö×ÓNNÄ£ÐÍÔ¤²âµÄJijÖµ×÷ΪÁÚ½ü¶þëĵÄÕñ¶¯ñîºÏÓÃÓÚÉú³É·Ç¶Ô½ÇÏßÔªËØ¡£×îºó£¬ÎÒÃǶÔÕû¸öÄ£Ð͹þÃܶÙÁ¿½øÐжԽǻ¯Çó½âµÃµ½µ°°×ÖʵĺìÍâ¹âÆ×¡£Í¬Ê±£¬ÎÒÃÇÒ²½«Õâ¸öML¹¤¾ß·¢²¼µ½ÁËÍøÉÏ£¬ÌṩʵʱµÄµ°°×ÖʹâÆ×Ô¤²â(http://dcaiku.com:12880/platform/first)¡£
B. ML protocol¶Ôµ°°×Öʶþ¼¶½á¹¹µÄ·Ö±æ£º
ΪÁ˲âÊÔÎÒÃÇ¿ª·¢µÄ»úÆ÷ѧϰ·½°¸Ô¤²â¹âÆ×µÄ׼ȷÐÔ£¬ÎÒÃDzâÊÔÁË12Öв»Í¬ÀàÐ͵ĵ°°×ÖÊ£¬¼´²»Í¬±ÈÀýµÄ¦Á-ÂÝÐýºÍ¦Â-ÕÛµþ£¬Ê¹ÓûúÆ÷ѧϰԤ²âÁËËüÃǵĺìÍâ¹âÆ×²¢ÓëʵÑé¹âÆ×½øÐÐÁ˱Ƚϡ£
Table 1. ML predicts IR protein spectra with the root mean square error (RMSE) and high Spearman rank correlation (¦Ñ) indicates the quantitative agreement with experiment. Structures of 12 proteins with different sizes were taken from the Protein Data Bank, representing a diverse range of secondary structure contents, i.e., different fractions of ¦Á-helix and ¦Â-sheet. The IR spectrum of each protein was computed based on 1000 MD configurations. All reported calculation times refer to calculations on eight cores of an Intel(R) Xeon(R) CPU (E5-2683v4 @ 2.1GHz).
ÎÒÃÇ´ÓSpearman rank correlation(¦Ñ)À´ºâÁ¿ÀíÂÛÄ£Äâ¹âÆ×ÓëʵÑé²âÁ¿¹âÆ×Ö®¼äµÄÏàËÆ¶È¡£´Óͼ4ºÍ±í1ÖпÉÖª£¬ÀíÂÛÔ¤²âÓëʵÑé²âÁ¿ÎǺϽϺÃ(11¸öµ°°×µÄ¦Ñ> 0.80£¬½öÓÐ1DHRµÄ¦ÑΪ0.71)¡£µÃÒæÓÚ»úÆ÷ѧϰ¶Ô¹âÆ×Ä£Äâ¾Þ´óµÄËÙ¶ÈÌáÉý£¬ÎÒÃÇ¿ÉÒÔÄ£Äâ1000¸öµ°°×Öʶ¯Á¦Ñ§¿ìÕÕ(Õâ¶ÔÖ±½ÓµÄÁ¿×Ó»¯Ñ§¼ÆËã»á·Ç³£°º¹ó)À´Ô¤²âºìÍâ¹âÆ×£¬´Ó¶ø²¶»ñÿÖÖµ°°×ÖʵĶ¯Ì¬ÌØÕ÷¡£×ÜÌåÀ´Ëµ£¬»úÆ÷ѧϰģÐÍÔ¤²âµÄ¹âÆ×³É¹¦µØÔÙÏÖÁËʵÑé¹âÆ×µÄ»ù±¾ÌØÕ÷(Ö÷·åºÍÏßÐÎ)¡£
ΪÁ˽øÒ»²½²âÊÔÎÒÃÇ»úÆ÷ѧϰģÐ͵ij°ôÐÔºÍÇ¨ÒÆÐÔ£¬ÎÒÃÇÄ£ÄâÁ˲»Í¬Î¶ÈÏÂ(1.6¡æ, 28.6¡æ, 55.6¡æ, 82.6¡æ)µÄUbiquitinµ°°×µÄºìÍâ¹âÆ×£¬´ÓͼÖÐÎÒÃÇ¿ÉÒÔ¿´µ½£¬Ëæ×ÅζȵÄÖð½¥Éý¸ß£¬Ubiquitinµ°°×µÄºìÍâ¹âÆ×Öð½¥À¶ÒÆ£¬»úÆ÷ѧϰģÄâµÄ½á¹ûÓëʵÑé¹âÆ×ºÜºÃµÄÎǺϡ£Õâ˵Ã÷ÎÒÃǵĻúÆ÷ѧϰģÐͶÔÓڱ仯µÄÍⲿ»·¾³ÒòËØ(ζȱ仯)¾ßÓÐÁ¼ºÃµÄÇ¨ÒÆÐÔ¡£Ëæºó£¬ÎÒÃÇʹÓÿª·¢µÄ»úÆ÷ѧϰģÐÍÈ¥¸ú×ÙTrp-Cageµ°°×ÖÊÕÛµþ¹ý³Ì£¬´ÓͼÖпÉÒÔ¿´µ½£¬Ëæ×ÅÕÛµþ¹ý³ÌµÄ½øÐУ¬ºìÍâ¹âÆ×ÓÐ×Å10cm-1(S1:1652 cm-1, S25:1650 cm-1, S50:1646 cm-1, S75:1644 cm-1, S100:1642 cm-1)ºìÒÆ£¬»úÆ÷ѧϰģÄâµÄ½á¹ûÓë֮ǰʵÑéÒÔ¼°ÀíÂÛÄ£ÄâµÄ½á¹ûÒ»Ö£¬ÕâͬÑù˵Ã÷ÁËÎÒÃǵĻúÆ÷ѧϰģÐÍÄÜ׼ȷµÄ·Ö±æµ°°×Öʶþ¼¶½á¹¹µÄ±ä»¯£¬¿ÉÒÔÓÃÓÚʵʱ¸ú×Ùµ°°×ÖʵÄÕÛµþ¹ý³Ì¡£
×ܽáÓëÕ¹Íû
ÎÒÃDZ¨µÀÒ»ÖÖ»ùÓÚµÚÒ»ÐÔÔÀíÊý¾ÝµÄ»úÆ÷ѧϰÁ÷³Ì£¬¸ÃÁ÷³Ì¿ÉÒÔ¸ù¾Ýµ°°×ÖʵĻù̬½á¹¹ÐÅϢ׼ȷµÄÔ¤²âµ°°×ÖʵÄõ£°·I´øºìÍâ¹âÆ×¡£Ó봫ͳµÄÁ¿×Ó»¯Ñ§¼ÆËã·½·¨Ïà±È£¬Ëü´ó´óÌá¸ßÁ˵°°×ÖʺìÍâ¹âÆ×µÄÀíÂÛÄ£ÄâËÙ¶È¡£¸üÖØÒªµÄÊÇ£¬Ëù½¨Á¢µÄ»úÆ÷ѧϰģÐ;ßÓÐÓÅÐãµÄ¿ÉÇ¨ÒÆÐÔ£¬¿ÉÒÔÔ¤²âѵÁ·¼¯·¶Î§ÒÔÍâµÄµ°°×ÖʹâÆ×ÏìÓ¦£¬Ä£Äⲻͬ״̬ϵÄÐźű仯£¬´Ó¶ø·Ö±æµ°°×ÖʵĶþ¼¶½á¹¹¡¢¿¼²ìζÈÓ°Ïì¡¢¸ú×Ùµ°°×ÖÊÕÛµþµÈ¡£Ä¿Ç°ÎÒÃÇÕýÔÚͨ¹ýÔö¼ÓѵÁ·Êý¾Ý¼¯²¢ÇÒ¿¼ÂÇÏÔÐÔµÄÈܼÁЧӦÀ´Ìá¸ßÎÒÃÇ»úÆ÷ѧϰģÐ͵Ä׼ȷÐÔ£¬²¢Ì½Ë÷½«¸ÃÄ£ÐÍÀ©Õ¹µ½ÆäËû¹âÆ×ÐÔÖÊÑо¿£¬°üÀ¨×ÏÍâÎüÊÕ(UV)¡¢ÀÂü(Raman)¡¢ºÍƵ¹âÆ× (SFG)¡¢¶àά¹âÆ×(Multidimensional Spectroscopies)µÈ¡£
ÎÄÏ×À´Ô´
ÂÛÎıêÌ⣺A Machine Learning Protocol for Predicting Protein Infrared Spectra.
ÂÛÎÄÁ´½Ó£ºhttp://dx.doi.org/10.1021/jacs.0c06530.
science±à¼ÍƼöÎÒÃǵĻúÆ÷ѧϰ-Á¿×Ó»¯Ñ§¼¼Êõ