Uploaded image for project: 'OpenOLAT'
  1. OpenOLAT
  2. OO-667

Externalize PDF text extraction

    XMLWordPrintable

    Details

      Description

      As for the thumbnails generation, the PDF text extraction can run amok (need unlimited RAM, need 400% CPU) if the library has problems with a PDF file. The solution is to externalize the process. In olat.local.properties there are 2 new properties:

      search.pdf.external=true
      search.pdf.external.command=/Users/srosse/Downloads/convertpdf.sh

      And in scripts folder there is an example of the script used to extract text with PDFBox.

      To mimic the intern process of OpenOLAT, we have a custom build of the pdfbox-app with a slightly modified ExtractText class.

        Attachments

          Activity

            People

            • Assignee:
              srosse Stéphane Rossé
              Reporter:
              mkurian Matthai Kurian
              Tester:
              Joël Krähemann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 3 hours, 30 minutes
                3h 30m