【原】NMDS分析

科白君 2021-09-08

展開全文

“ No one knows everything, and you don't have to.” --free傻孩子

"R數(shù)據(jù)分析"專題·第15篇

編輯 | free傻孩子

4445字 | 7分鐘閱讀

本期推送內(nèi)容：

本節(jié)為大家介紹一種常用的排序方法NMDS分析，全稱為非度量多維尺度分析 (non-metric multidimensional scaling)。NMDS是一種將多維空間的研究對(duì)象（樣本或變量）簡(jiǎn)化到低維空間進(jìn)行定位、分析和歸類，同時(shí)又保留對(duì)象間原始關(guān)系的數(shù)據(jù)分析方法。目前網(wǎng)絡(luò)上關(guān)于NMDS分析的介紹和論述已經(jīng)很多了，本公眾號(hào)就不再贅述了。本節(jié)的關(guān)注點(diǎn)是如何繪制漂亮的NMDS散點(diǎn)圖。

哪些數(shù)據(jù)適宜NMDS分析

也許大家已經(jīng)發(fā)現(xiàn)NMDS分析大多情況下是用來展示物種數(shù)據(jù)的一種分析方法，為什么呢？這是因?yàn)楫?dāng)前流行使用的物種數(shù)據(jù)大多數(shù)為OTU或ASV測(cè)序數(shù)據(jù)，這類數(shù)據(jù)包含豐富的0值。NMDS分析一般使用的是Bray-Curtis距離算法，該算法對(duì)0值不敏感，換句話說即使有很多的0值的情況下也能獲得較為穩(wěn)健的結(jié)果。因此，NMDS分析適宜于測(cè)序數(shù)據(jù)的分析。然而，因?yàn)樵摲椒ㄊ褂玫氖欠菂?shù)的方法，所以不能給出每一軸對(duì)于數(shù)據(jù)分布的解釋量（如有錯(cuò)誤請(qǐng)指正），這是該方法存在的局限性。

加載NMDS分析所需要的包，如下：

library(tidyverse)
library(vegan)
library(MASS)
library(readxl)

數(shù)據(jù)導(dǎo)入

readxlsx <- function(file = "file.xlsx", n =3) {
  require(readxl)
  dat <- list()
  i = 0
  while (i < n) {
    i = i+1
    dat[[i]] <- read_excel(file, sheet = i, col_names = T)
  }
  return(dat)
}

otu <- readxlsx(file = "1-16S.xlsx", n =5)

數(shù)據(jù)處理

ITS <- otu[[4]] %>% 
  data.frame()#tibble data change to dataframe data
rownames(ITS) <- ITS$OTUid#defined row names
ITS2 <- ITS[,-1] %>% 
  t() %>% #transposition
  data.frame()#row names are sample names, and colnum names are OTU id

head(ITS2[,1:6])

數(shù)據(jù)格式如下：

ITS.nmds<-metaMDS(ITS2)
ITS.nmds#The smaller the value of stress, the better the goodness of fit
#當(dāng)stress > 0.2時(shí)表明使用該方法不合適，建議使用其它方法對(duì)數(shù)據(jù)進(jìn)行#分析
stressplot(ITS.nmds)#檢查觀測(cè)值非相似性與排序距離之間的關(guān)系

擬合結(jié)果顯示，沒有點(diǎn)出現(xiàn)在距離線段很遠(yuǎn)的位置，意味著該數(shù)據(jù)可以使用NMDS分析。

簡(jiǎn)單出圖

1). 只顯示樣方點(diǎn)

ordiplot(ITS.nmds, type = "text",display = "sites")

2). 只顯示物種信息

ordiplot(ITS.nmds, type = "text",display = "species")

ggplot2繪圖

提取樣方點(diǎn)

#提取樣方點(diǎn)
ITS.scores <- as.data.frame(scores(ITS.nmds)) #提取點(diǎn)
ITS.scores %>% 
  as_tibble(rownames = "sample") ->ITS_sites
ITS_sites

根據(jù)處理給數(shù)據(jù)分組

otu[[3]] %>% 
  dplyr::select(Code,Tdiff) %>% 
  mutate(group = if_else(Tdiff>0,"warmer",
                         if_else(Tdiff<0,"colder","in_situ"))) ->group
group

將分組信息添加到NMDS數(shù)據(jù)樣點(diǎn)中

ITS_sites %>% 
  left_join(group, by = c("sample" = "Code")) %>% 
  filter(group!="NA")->ITS_sites2
ITS_sites2

因?yàn)槲矣玫氖且寻l(fā)表文章中的數(shù)據(jù)，數(shù)據(jù)給出的樣方信息和分組信息數(shù)量不匹配所以我過濾掉了不匹配的部分，如果處理自己的數(shù)據(jù)則不必使用filter函數(shù)。

繪圖背景等參數(shù)設(shè)置（直接粘貼并運(yùn)行）

main_theme = theme(panel.background=element_blank(),
                   panel.grid=element_blank(),
                   axis.line.x=element_line(size=0.5, colour="black"),
                   axis.line.y=element_line(size=0.5, colour="black"),
                   axis.ticks=element_line(color="black"),
                   axis.text=element_text(color="black", size=12),
                   legend.position="right",
                   legend.background=element_blank(),
                   legend.key=element_blank(),
                   legend.text= element_text(size=12),
                   text=element_text(family="sans", size=12),
                   plot.title=element_text(hjust = 0.5,vjust=0.5,size=12),
                   plot.subtitle=element_text(size=12))

繪圖

ggplot(data=ITS_sites2,aes(NMDS1,NMDS2)) +
  geom_hline(aes(yintercept=0),colour="#d8d6d6",linetype=5)+ 
  geom_vline(aes(xintercept=0),colour="#d8d6d6",linetype=5)+
  geom_point(aes(color = group),shape = 19,size = 3.5)+
  scale_color_manual(values = c("#2166ac","#f4a582","#e31a1c"))+
  #scale_x_continuous(breaks = seq(-0.59,0.66,0.2),limits = c(-0.59,0.66))+
  #scale_y_continuous(breaks = seq(-0.60,0.45,0.15),limits = c(-0.60,0.45))+
  labs(x= "NMDS1", y = "NMDS2",color = "Treatments")+
  theme_bw() + 
  main_theme

分組NMDS和“等溫線”NMDS

分組NMDS

ggplot(data=ITS_sites2,aes(NMDS1,NMDS2)) +
  geom_hline(aes(yintercept=0),colour="#d8d6d6",linetype=5)+ 
  geom_vline(aes(xintercept=0),colour="#d8d6d6",linetype=5)+
  geom_point(aes(color = group),shape = 19,size = 3.5)+
  scale_color_manual(values = c("#2166ac","#f4a582","#e31a1c"))+
  #scale_x_continuous(breaks = seq(-0.59,0.66,0.2),limits = c(-0.59,0.66))+
  #scale_y_continuous(breaks = seq(-0.60,0.45,0.15),limits = c(-0.60,0.45))+
  stat_ellipse(aes(fill=group),geom="polygon",level=0.95,alpha=0.15)+
  labs(x= "NMDS1", y = "NMDS2",
       color = "Treatments",fill = "Treatments")+
  theme_bw() + 
  main_theme

“等溫線”NMDS

“等溫線”NMDS適用于處理比較多的情況，如梯度等

1）重新構(gòu)建分組

ITS_sites2 %>% 
  mutate(group2=if_else(Tdiff< -5.7,"very cold",
                        if_else(Tdiff< 0,"cold",
                                if_else(Tdiff<5.7, "in situ",
                                        if_else(Tdiff <9.6, "warm","hot"))))) ->ITS_sites3
ITS_sites3

ggplot(data=ITS_sites3,aes(NMDS1,NMDS2)) +
  geom_hline(aes(yintercept=0),colour="#d8d6d6",linetype=5)+ 
  geom_vline(aes(xintercept=0),colour="#d8d6d6",linetype=5)+
  geom_point(aes(color = group2),shape = 19,size = 3.5)+
  #scale_color_manual(values = c("#2166ac","#f4a582","#e31a1c"))+
  #scale_x_continuous(breaks = seq(-0.59,0.66,0.2),limits = c(-0.59,0.66))+
  #scale_y_continuous(breaks = seq(-0.60,0.45,0.15),limits = c(-0.60,0.45))+
  stat_density2d(aes(color = group2),size = 0.6)+
  labs(x= "NMDS1", y = "NMDS2",
       color = "Treatments")+
  theme_bw() + 
  main_theme