POI用戶模型的重構與優化

2019-10-08 06:43:30吉豪杰宋欣潮

軟件 2019年5期

吉豪杰宋欣潮

摘 ?要： Apache POI中的用戶模型是目前用于處理Excel數據的最為廣泛的應用技術，但用戶模型存在許多明顯的弊端。本文將以一個學生檔案管理系統為例，對用戶模型中存在的問題以及產生問題的原因進行分析，并針對這些問題，借用面向過程的設計思想對用戶模型進行優化和改進。使用不同規模的數據量對改進前后的用戶模型進行測試，并對測試結果進行比較和分析，最終在一定程度上解決了用戶模型的弊端，提升了用戶模型的性能。

關鍵詞：用戶模型;生產者消費者模型;POI技術;數據優化;內存溢出;設計模式;java多線程

中圖分類號： TP315 ? ?文獻標識碼： A ? ?DOI：10.3969/j.issn.1003-6970.2019.05.038

本文著錄格式：吉豪杰，宋欣潮. POI用戶模型的重構與優化[J]. 軟件，2019，40（5）：193199

【Abstract】： The UserModel in Apache POI is the most widely used technology for processing Excel data at present， but the user model has many obvious disadvantages.This paper will take a student file management system as an example to analyze the problems existing in the UserModel and the causes of the problems. In view of these problems， the UserModel will be optimized and improved by using the process-oriented design idea.The data volumes of different scales were used to test the UserModel before and after the improvement， and the test results were compared and analyzed. Finally， the disadvantages of the UserModel were solved to some extent， and the performance of the UserModel was improved.

【Key words】： UserModel; Producer consumer model; POI; Data optimization; Out of memoryerror; Design mode; Java multithreading

0 ?引言

在當前的軟件開發中，越來越多的需求涉及到對MicroSoft Office文檔的處理，其中對Excel數據文檔的處理尤為普遍。因此，關于對Excel文件處理的討論與研究也愈演愈烈。到目前為止，已經出現了許多關于處理Excel文件的技術和開源項目，例如Java Excel Api（jxl），Apache POI[1]，Alibaba EasyExcel等。這些開源項目各有特色，都能夠適用于不同的開發場景，滿足了大多數不同的開發需求，但也存在諸多問題，其中對Apache POI相關技術所存在問題的研究是本文討論的重點。本文的創新點在于將面向過程的生產者消費者模型[2]的設計思想應用到傳統的用戶模型當中，以及采用多線程[3-9]的方式實現用戶模型，對傳統的用戶模型處理Excel數據的程序進行重構，使用戶模型在程序結構上邏輯更加清晰、功能更加明確，在數據處理能力上更加高效。

1 ?POI用戶模型

1.1 ?用戶模型簡介

POI是由Apache組織提供的用java編寫的免費開源的跨平臺的 Java API，Apache POI提供API給Java程序對MicroSoft office格式檔案讀和寫的功能[1]。POI中關于Excel數據處理的部分主要包括User API、Event API和Streaming UserModel API。本文將重點討論User API中的UserModel。

所謂的UserModel實際上就是基于Dom方式的解析，Dom解析就是將文件全部讀入內存，對文件內部的結構進行建模，形成一顆Dom樹的過程，如圖1用戶模型的Dom樹結構。

從圖1中可以看出，用戶模型提供封裝好的Workbook、Sheet、Row、Cell等實例來完成對excel數據的讀寫。

1.2 ?用戶模型的應用

經過對用戶模型的簡單介紹，現在以文獻[4]中提到的學生檔案管理系統為例，對用戶模型的Excel數據解析功能進行實現。該功能的業務處理過程為：①以流的方式接受excel文件;②根據接受的文件生成WorkBook對象;③根據Dom結構，遍歷每個Sheet的每一個Row，將每一個Row中的Cell的值讀取出來，存放到list集合中;④對list中的數據進行類型轉換，并封裝到領域對象Student中，生成存放Student對象的集合;⑤利用數據庫的批量添加操作，將Student集合持久化到數據庫中（涉及到多表操作）。

用戶模型代碼：

Student類：

public class Student {

private BigInteger stuId;//學號

private Archive archive;//檔案

private Profession pro;//專業

private Department department;//院系

private String stuName;//姓名

private String stuSex;//性別

private String stuSendnum;//派遣證號

private String stuClass;//班級

private String stuLocation;//生源地

//省略部分屬性和set、get方法

}

用戶模型解析excel方法：

public static List> getListByExcel（InputStream in，String fileName） throws Exception{

List> list=null;

Workbook wb=getWorkBook（in，fileName）; //獲取WorkBook對象

if （wb！=null）{

Sheet sheet=null;

Row row=null;

Cell cell=null;

list=new ArrayList>（）;

for （int i=0;i

sheet=wb.getSheetAt（i）;

if （sheet==null）{continue;}

//遍歷每一行

for （int j=sheet.getFirstRowNum（）; j<=sheet.getLastRowNum（）;j++）{

row=sheet.getRow（j）;

Integer columns= （int）row. getLastCellNum（）;

//遍歷每一列

List