Trimble Business Center

 View Only
Expand all | Collapse all

Feature Request - Extract Text from PDF

  • 1.  Feature Request - Extract Text from PDF

    Posted 11-22-2018 12:43

    I work in the estimating department doing site takeoff. Very rarely in the tender stage do we have access to CAD files, it is usually PDF's ( a lot of the time very low quality). For most of the PDFs I get, if I import vector info the elevation text just gets converted to polylines, so I can't use it for spot elevations. It would be nice to have a function that would extract elevation text from the PDF and convert it to useable points or multiline text which TBC could use for spot elevations. I know of at least 1 software out there that does this and was wondering if Trimble plans to put this in a future version.



  • 2.  Re: Feature Request - Extract Text from PDF

    Posted 11-24-2018 13:13

    Some PDFs come with actual Text embedded in them, when that is the case we will extract the text automatically. Where the Text is in the PDF as "Stroked Lines" and not Text we have to look at two options 

     

    1) Try to interpret the stroked lines (that is exceedingly hard to do reliably because you get all sorts of "representations in the PDF Files - we had a team working on that for quite a while and they felt that it was nearly impossible to get that right 100% and we have to be so careful with "wrong numbers" 

     

    2) Use OCR on the Image itself to identify the Text elements and extract them. I have tried a number of OCR products out there with limited success - they all need characterization before they can do anything and I have seen varying degrees of accuracy in the results.

     

    This is not to say that we cannot or will not do this, only to say that it is not as easy as it sounds and that we have tried - I will take another look at this over the next few weeks to see if I can find a solution that is workable in TBC.

     

    Alan



  • 3.  RE: Re: Feature Request - Extract Text from PDF

    Posted 12-11-2020 07:47
    Alan,

    Have there been any updates or advances in technology with OCR that will allow for better text extraction? It's very hit or miss in the plans I read on a daily basis on whether elevations will come in as a text or as CAD lines. We use Bluebeam daily, so I'm specifically wondering if anyone in the Trimble community has explored OCR through Trimble and achieved better results since the last post on this thread 2 years ago.