What is MIME type "text/vnd.hocr+html"?

A MIME type is a string that tells browsers and other tools how to handle a particular kind of file.

text/vnd.hocr+html defines files that contain OCR results in an HTML format. It uses the hOCR standard to embed details like text layout and word positioning directly into the HTML code.
This format makes it easy for programs and browsers to display recognized text along with its spatial metadata. Software—often OCR engines such as Tesseract—generates these files when converting images of text into searchable, selectable text.
Files using this MIME type are typically saved with the HTML or HOCR extension.

Associated file extensions

Usage Examples

HTTP Header

When serving content with this MIME type, set the Content-Type header:


    Content-Type: text/vnd.hocr+html    
  

HTML

In HTML, you can specify the MIME type in various elements:


    <a href="file.dat" type="text/vnd.hocr+html">Download file</a>    
  

Server-side (Node.js)

Setting the Content-Type header in Node.js:


    const http = require('http');    
    
    http.createServer((req, res) => {    
      res.setHeader('Content-Type', 'text/vnd.hocr+html');    
      res.end('Content here');    
    }).listen(3000);    
  

Associated file extensions

FAQs

What is the purpose of the text/vnd.hocr+html MIME type?

This MIME type identifies HTML files that follow the hOCR standard, which embeds Optical Character Recognition (OCR) data into valid HTML. It allows software to store text, layout, and confidence information hidden behind or overlaid on scanned images, making them searchable and selectable.

How do I open a text/vnd.hocr+html file?

Because hOCR files are fundamentally valid HTML, you can open them in any standard web browser like Chrome, Firefox, or Edge. The browser treats the file similarly to text/html, rendering the content based on the embedded CSS and text.

Should I use the .html or .hocr file extension?

If the file is intended for direct viewing in a browser, the .html extension is preferred for immediate compatibility. However, using the .hocr extension helps developers and automated scripts distinguish these files from standard web pages.

How do I configure Apache or Nginx to serve .hocr files correctly?

Servers often treat unknown extensions as plain text. For Apache, add AddType text/vnd.hocr+html .hocr to your configuration. For Nginx, add text/vnd.hocr+html hocr; inside your types { ... } block to ensure the Content-Type header is sent correctly.

How is this MIME type different from standard text/html?

While both contain HTML markup, text/vnd.hocr+html specifically signals that the document contains hOCR microformats (like class='ocr_line' and title='bbox...'). This distinction is crucial for indexing bots and accessibility tools that need to extract spatial text data.

Can I convert text/vnd.hocr+html to PDF?

Yes, hOCR files are frequently used as an intermediate step to create searchable PDFs. Tools like Tesseract or hocr2pdf take the layout data from the hOCR file and overlay it onto the original image to generate a PDF/A document.

Are there security concerns with hOCR files?

Yes, because hOCR files are HTML, they can contain executable JavaScript. If your application accepts hOCR uploads from users, you must sanitize the files to prevent Cross-Site Scripting (XSS) attacks before rendering them in a browser.

General FAQ

What is a MIME type?

A MIME (Multipurpose Internet Mail Extensions) type is a standard that indicates the nature and format of a document, file, or assortment of bytes. MIME types are defined and standardized in IETF's RFC 6838.

MIME types are important because they help browsers and servers understand how to process a file. When a browser receives a file from a server, it uses the MIME type to determine how to display or handle the content, whether it's an image to display, a PDF to open in a viewer, or a video to play.

MIME types consist of a type and a subtype, separated by a slash (e.g., text/html, image/jpeg, application/pdf). Some MIME types also include optional parameters.

How do I find the MIME type for a file?

You can check the file extension or use a file identification tool such as file --mime-type on the command line. Many programming languages also provide libraries to detect MIME types.

Why are multiple MIME types listed for one extension?

Different applications and historical conventions may use alternative MIME identifiers for the same kind of file. Showing them all helps ensure compatibility across systems.