轉(zhuǎn)自:http://blog./uid-23027556-id-96691.html
一. 發(fā)現(xiàn)
1.在主庫執(zhí)行insert語句, 發(fā)現(xiàn)從庫上沒有相應(yīng)的執(zhí)行。
2.使用show processlist,沒有發(fā)現(xiàn)SQL IO的進(jìn)程
mysql> show processlist;
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
|
 Id | User        | Host      | db   | Command | Time | 
State                                                                      
 | Info             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
| 
 3 | root        | localhost | an   | Query   |    0 | 
NULL                                                                       
 | show processlist |
| 12
 | system user |           | NULL | Connect |  117 | Waiting for master 
to send event                                            | 
NULL             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
3.查看err日志
101008
 10:04:03 [Note] Slave SQL thread initialized, starting replication in 
log 'bin-log.000002′ at position 198, relay log 
'/var/local/mysql/logdir/relay_log.000004′ position: 342
101008
 10:04:28 [ERROR] Slave SQL: Error 'Table 'an_tb’ already exists’ on 
query. Default database: 'an’. Query: 'create table an_tb (id int) 
type=innodb’, Error_code: 1050
101008
 10:04:28 [Warning] Slave: The syntax 'TYPE=storage_engine’ is 
deprecated and will be removed in MySQL 6.0. Please use 
'ENGINE=storage_engine’ instead Error_code: 1287
101008 10:04:28 [Warning] Slave: Table 'an_tb’ already exists Error_code: 1050
101008
 10:04:28 [ERROR] Error running query, slave SQL thread aborted. Fix the
 problem, and restart the slave SQL thread with “SLAVE START”. We 
stopped at log 'bin-log.000002′ position 98
4.根據(jù)陶方的提醒,使用show slave status\G
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000004
Relay_Log_Pos: 242
Relay_Master_Log_File: bin-log.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1050
Last_Error: Error 'Table 'an_tb’ already exists’ on query. Default database: 'an’. Query: 'create table an_tb (id int) type=innodb’
Skip_Counter: 0
Exec_Master_Log_Pos: 98
Relay_Log_Space: 14332
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1050
Last_SQL_Error:
 Error 'Table 'an_tb’ already exists’ on query. Default database: 'an’. 
Query: 'create table an_tb (id int) type=innodb’
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
總結(jié)原因: slave服務(wù)器在執(zhí)行relay log的時(shí)候,有個(gè)sql執(zhí)行不下去了,卡住了。
該sql為:create table an_tb (id int) type=innodb’
二. 解決方法:
第一種:基本思路是drop掉an_tb表, 結(jié)果:不成功!
過程是:
在從庫上:
drop table an_tb;
stop slave;
start slave;
結(jié)果是:報(bào)同樣的錯(cuò)誤。 因?yàn)槊看蝧tart slave之后,都會(huì)從relay log中讀出create table an_tb的SQL;
第二種:基本思路是跳過relay log, 結(jié)果:不成功!
簡單的過程:
MYSQL> stop slave;
MYSQL>change master to
master_log_file=’bin-log.000006′,
master_log_pos=612,
RELAY_LOG_FILE=’relay_log.000023′,
RELAY_LOG_POS=612;
MYSQL> start slave;
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000017
Relay_Log_Pos: 612
Relay_Master_Log_File: bin-log.000006
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 612
Relay_Log_Space: 15448
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
此時(shí)在show slave status中看不到錯(cuò)誤了, 但是在err文件中,還是有同樣的錯(cuò)誤的
101008
 10:40:37 [Note] Slave SQL thread initialized, starting replication in 
log 'bin-log.000002′ at position 607, relay log 
'/var/local/mysql/logdir/relay_log.000012′ position: 751
101008 10:40:37 [ERROR]
 Slave SQL: Error 'Table 'an_idb1′ already exists’ on query. Default 
database: 'an’. Query: 'create table an_idb1 (id int) type=innodb’, 
Error_code: 1050
101008
 10:40:37 [Warning] Slave: The syntax 'TYPE=storage_engine’ is 
deprecated and will be removed in MySQL 6.0. Please use 
'ENGINE=storage_engine’ instead Error_code: 1287
101008 10:40:37 [Warning] Slave: Table 'an_idb1′ already exists Error_code: 1050
101008
 10:40:37 [ERROR] Error running query, slave SQL thread aborted. Fix the
 problem, and restart the slave SQL thread with “SLAVE START”. We 
stopped at log 'bin-log.000002′ position 607
101008
 10:40:37 [Note] Slave I/O thread: connected to master 
'anmh@dbadb1.corp.alimama.com:3307′,replication started in log 
'bin-log.000006′ at position 612
但是發(fā)現(xiàn)現(xiàn)在的表變成了an_idb1了, 不再是an_tb。  看來這個(gè)方法是不對(duì)的。
第三種方法:思路還是要跳過relay log。采用reset slave的方法。結(jié)果是:不成功!
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.12 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File:
Read_Master_Log_Pos: 4
Relay_Log_File: relay_log.000001
Relay_Log_Pos: 4
Relay_Master_Log_File:
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 0
Relay_Log_Space: 126
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
可以看到reset slave是把相關(guān)值做默認(rèn)設(shè)置
然后使用
mysql> start slave;
Query OK, 0 rows affected (0.09 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000004
Relay_Log_Pos: 751
Relay_Master_Log_File: bin-log.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1050
Last_Error: Error 'Table 'an_idb1′ already exists’ on query. Default database: 'an’. Query: 'create table an_idb1 (id int) type=innodb’
Skip_Counter: 0
Exec_Master_Log_Pos: 607
Relay_Log_Space: 14332
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1050
Last_SQL_Error:
 Error 'Table 'an_idb1′ already exists’ on query. Default database: 
'an’. Query: 'create table an_idb1 (id int) type=innodb’
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
發(fā)現(xiàn)還是沒有啟動(dòng)slave io進(jìn)程, 查看err日志
101008 10:53:52 [Note] Slave I/O thread killed while reading event
101008 10:53:52 [Note] Slave I/O thread exiting, read up to log 'bin-log.000006′, position 612
101008
 10:58:15 [Note] Slave SQL thread initialized, starting replication in 
log 'bin-log.000006′ at position 612, relay log 
'/var/local/mysql/logdir/relay_log.000017′ position: 612
101008 10:58:15 [ERROR] Error in Log_event::read_log_event(): 'read error’, data_len: 258, event_type: 0
101008 10:58:15 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
101008 10:58:15 [ERROR] Slave
 SQL: Relay log read failure: Could not parse relay log event entry. The
 possible reasons are: the master’s binary log is corrupted (you can 
check this by running 'mysqlbinlog’ on the binary log), the slave’s 
relay log is corrupted (you can check this by running 'mysqlbinlog’ on 
the relay log), a network problem, or a bug in the master’s or slave’s 
MySQL code. If you want to check the master’s binary log or slave’s 
relay log, you will be able to know their names by issuing 'SHOW SLAVE 
STATUS’ on this slave. Error_code: 1594
101008
 10:58:15 [ERROR] Error running query, slave SQL thread aborted. Fix the
 problem, and restart the slave SQL thread with “SLAVE START”. We 
stopped at log 'bin-log.000006′ position 612
101008
 10:58:15 [Note] Slave I/O thread: connected to master 
'anmh@dbadb1.corp.alimama.com:3307′,replication started in log 
'bin-log.000006′ at position 612
報(bào)新的錯(cuò)誤了,binlog都掛了。呵呵,這說明reset slave之后, slave的一切設(shè)置都被清空了。
這也讓我想起了,可以繼續(xù)使用change master to來做設(shè)置。
第四種,終于成功了的方法。 基本思路還是跳過relay log
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)
mysql> reset slave;
Query OK, 0 rows affected (0.13 sec)
mysql> change master to
-> master_log_file=’bin-log.000006′,
-> master_log_pos=612;
Query OK, 0 rows affected (0.18 sec)
在這里的master log和master log pos一定要是最新的,或者是比較新的。(總之要跳過報(bào)錯(cuò)的sql)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000002
Relay_Log_Pos: 242
Relay_Master_Log_File: bin-log.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 612
Relay_Log_Space: 392
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
mysql> show processlist;
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
|
 Id | User        | Host      | db   | Command | Time | 
State                                                                      
 | Info             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
| 
 3 | root        | localhost | an   | Query   |    0 | 
NULL                                                                       
 | show processlist |
|
 12 | system user |           | NULL | Connect |  117 | Waiting for 
master to send event                                            | 
NULL             |
|
 13 | system user |           | NULL | Connect |   51 | Slave has read 
all relay log; waiting for the slave I/O thread to update it | 
NULL             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
3 rows in set (0.00 sec)
終于不報(bào)錯(cuò)啦, 而且SQL IO進(jìn)程也啟動(dòng)起來了。 一切ok。