Uploaded image for project: 'OpenOLAT'
  1. OpenOLAT
  2. OO-667

Externalize PDF text extraction

    XMLWordPrintable

    Details

      Description

      As for the thumbnails generation, the PDF text extraction can run amok (need unlimited RAM, need 400% CPU) if the library has problems with a PDF file. The solution is to externalize the process. In olat.local.properties there are 2 new properties:

      search.pdf.external=true
      search.pdf.external.command=/Users/srosse/Downloads/convertpdf.sh

      And in scripts folder there is an example of the script used to extract text with PDFBox.

      To mimic the intern process of OpenOLAT, we have a custom build of the pdfbox-app with a slightly modified ExtractText class.

        Attachments

          Activity

            People

            Assignee:
            srosse Stéphane Rossé
            Reporter:
            mkurian Matthai Kurian
            Tester:
            Joël Krähemann
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0 minutes
                0m
                Remaining:
                Remaining Estimate - 0 minutes
                0m
                Logged:
                Time Spent - 3 hours, 30 minutes
                3h 30m