|
今天突然有一個需求,要把統(tǒng)計局網站下載的Word文檔里的表格提取出來,放到Excel表中,便于下一步進行數(shù)據分析。 1. 引入擴展庫 # -*- coding: utf-8 -*- import docx from docx import Document import xlwt; import xlrd; import glob 2. 讀取Word文檔中的表格 def readdoc(filename): doc = docx.Document(filename) tables = [] for table in doc.tables: table_temp = [] for row in table.rows: row_temp = [] for cell in row.cells: row_temp.append(cell.text) table_temp.append(row_temp) tables.append(table_temp) return tables 3. 寫入Excel文件 def writeExcel(tables,filename): Sheet_index = 0 workbook = xlwt.Workbook(encoding='utf-8') for table in tables: worksheet = workbook.add_sheet('sheet' + str(Sheet_index),cell_overwrite_ok = True) Sheet_index = Sheet_index + 1 for rows in table: r = table.index(rows) for cell in rows: c = rows.index(cell) print(r,c,cell) worksheet.write(r,c,cell) workbook.save(filename[:-5] + '.xls') 4. 遍歷目錄下所有docx文件,并生成同名Excel文件 filenames = glob.glob('jtdoc/*.docx') for filename in filenames: tables = readdoc(filename) writeExcel(tables,filename) |
|
|