捕獲和轉換Web的工具

將URL和HTML轉換為DOCX

Python API

添加轉​​換HTML或網頁的功能 into將Word文檔發送到您的應用程序從未如此簡單 GrabzIt的Python API。 但是,在開始之前,請記住 URLToDOCX, HTMLToDOCX or FileToDOCX 方法 Save or SaveTo 必須調用方法才能實際創建DOCX。

基本選項

當DOCX轉換整個網頁時捕獲網頁 int可以包含許多頁面的Word文檔。 只需一個參數即可轉換網頁 int文字文件或 將HTML轉換為DOCX 如以下示例所示。

grabzIt.URLToDOCX("https://www.tesla.com")
# Then call the Save or SaveTo method
grabzIt.HTMLToDOCX("<html><body><h1>Hello World!</h1></body></html>")
# Then call the Save or SaveTo method
grabzIt.FileToDOCX("example.html")
# Then call the Save or SaveTo method

自訂識別碼

您可以將自定義標識符傳遞給 docx文檔 方法,如下所示,然後將該值返回給您的GrabzIt Python處理程序。 例如,此自定義標識符可以是數據庫標識符,從而允許DOCX文檔與特定數據庫記錄相關聯。

from GrabzIt import GrabzItDOCXOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItDOCXOptions.GrabzItDOCXOptions()
options.customId = "123456"

grabzIt.URLToDOCX("https://www.tesla.com", options)
# Then call the Save method
grabzIt.Save("http://www.example.com/handler.py")
from GrabzIt import GrabzItDOCXOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItDOCXOptions.GrabzItDOCXOptions()
options.customId = "123456"

grabzIt.HTMLToDOCX("<html><body><h1>Hello World!</h1></body></html>", options)
# Then call the Save method
grabzIt.Save("http://www.example.com/handler.py")
from GrabzIt import GrabzItDOCXOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItDOCXOptions.GrabzItDOCXOptions()
options.customId = "123456"

grabzIt.FileToDOCX("example.html", options)
# Then call the Save method
grabzIt.Save("http://www.example.com/handler.py")

頁眉和頁腳

要將頁眉或頁腳添加到Word文檔中,可以請求您要應用特定的 模板 生成的DOCX。 該模板必須是 saved並會指定頁眉和頁腳的內容以及任何特殊變量。 在下面的示例代碼中,用戶正在使用他們創建的名為“我的模板”的模板。

from GrabzIt import GrabzItDOCXOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItDOCXOptions.GrabzItDOCXOptions()
options.templateId = "my template"

grabzIt.URLToDOCX("https://www.tesla.com", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.docx")
from GrabzIt import GrabzItDOCXOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItDOCXOptions.GrabzItDOCXOptions()
options.templateId = "my template"

grabzIt.HTMLToDOCX("<html><body><h1>Hello World!</h1></body></html>", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.docx")
from GrabzIt import GrabzItDOCXOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItDOCXOptions.GrabzItDOCXOptions()
options.templateId = "my template"

grabzIt.FileToDOCX("example.html", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.docx")

將HTML元素轉換為DOCX

如果只想直接轉換div或span等HTML元素 int您可以使用GrabzIt的Python庫獲得Word文檔。 您必須通過 CSS選擇器 您希望轉換為HTML元素的 targetElement GrabzIt方法DOCXOptions 類。

...
<span id="Article">
<p>This is the content I am interested in.</p>
<img src="myimage.jpg">
</span>
...

在此示例中,我們希望捕獲跨度中ID為的所有內容 Article,因此我們將其傳遞給GrabzIt API,如下所示。

from GrabzIt import GrabzItDOCXOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItDOCXOptions.GrabzItDOCXOptions()
options.targetElement = "#Article"

grabzIt.URLToDOCX("http://www.bbc.co.uk/news", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.docx")