Some PDFs come with actual Text embedded in them, when that is the case we will extract the text automatically. Where the Text is in the PDF as "Stroked Lines" and not Text we have to look at two options
1) Try to interpret the stroked lines (that is exceedingly hard to do reliably because you get all sorts of "representations in the PDF Files - we had a team working on that for quite a while and they felt that it was nearly impossible to get that right 100% and we have to be so careful with "wrong numbers"
2) Use OCR on the Image itself to identify the Text elements and extract them. I have tried a number of OCR products out there with limited success - they all need characterization before they can do anything and I have seen varying degrees of accuracy in the results.
This is not to say that we cannot or will not do this, only to say that it is not as easy as it sounds and that we have tried - I will take another look at this over the next few weeks to see if I can find a solution that is workable in TBC.
Alan