Java - Use ZipInputStream to read URL ZIP file

由於需要去讀取台鐵火車時刻表open data，因此就想說在不下載zip壓縮檔的情況下

，能夠讀取zip內的XML format file，以此再透過DocumentBuilder進一步解析此ZipInputStream。

原則上，台鐵的open data可利用此url http://163.29.3.98/xml/20140218.zip做下載

大致流程如下：

一開始利用Java URL來進行link connect，並且openStream。接著利用ZipInputStream

getNextEntry來取得ZIP壓縮檔內的Entry，檢查此Entry是否為我們要取得的20100218.xml file

如果是的話，就將當下的ZipInputStream object丟進您要做解析XML的method。

相關程式碼如下：

public static boolean getZIPContentByStream(URL ZipURL, String target){
 try{
     URLConnection connection = ZipURL.openConnection();
     connection.setConnectTimeout(0);
     InputStream inputStream = connection.getInputStream();
     ZipInputStream zipInputStream = new ZipInputStream(inputStream);
     ZipEntry zipEntry = null;
     do {
        zipEntry = zipInputStream.getNextEntry();
        if(zipEntry == null) break;
     }while(zipEntry != null && (!target.equals(zipEntry.getName())));
     if(zipEntry != null ) {
        parseTrainXMLFile(zipInputStream);
     }
 }catch (FileNotFoundException e) {
     System.out.println("The Web "+target+" file not exist!");
     return false;
 }catch (IOException e) {
     e.printStackTrace();
 }
 return true;
}

此method會帶進new URL產生的object(與link connect)，即您針對要取得的Entry(20100218.xml)

接著若此zip file存在，則return true給上一層的method，以此帶進下一個參數

做循環的存取！由於台鐵一次只會提供45 days的資料供使用者下載！

透過這項過程，您可以更新到最新的open data的台鐵時刻表內容！

==========================================================

題外話

原本想在Google App Engine內做下載zip檔的動作，結果忘記白名單竟然沒有

FileOutputStream Class，導致要改變策略。Google Storage聽說是配套，但是乎

要付費，因此就順勢改變成read URL，直接來處理stream。

PS. 請注意！

原本是利用ZipURL.openStream() to get InputStream object

不過，目前改成下面的寫法比較有保險，避免timeout by latency on development console

(由於在development console 會出現 java.net.SocketTimeoutException )

URLConnection connection = ZipURL.openConnection();
connection.setConnectTimeout(0);
InputStream inputStream = connection.getInputStream();

setConnectTimeout(0) // A timeout of zero is interpreted as an infinite timeout.

Zhi-Bin's 談天說地

搜尋此網誌

Java - Use ZipInputStream to read URL ZIP file

留言

張貼留言