Hit Highlighting inside Adobe PDFs using SharePoint Search

So, I’ve posting about some work I’ve been doing with SharePoint Search. Now, to pull it all together – I’ve been trying to do hit highlighting inside Adobe Reader, so if a user clicks on a search result Adobe Reader opens and automatically searches for occurances of the search term. Doing this requires knowing your file extension (so you only process PDFs this way), knowing the query string parameter for what is being searched for, and that you can pass parameters into Acrobat Reader. I’m not going to go too much into that last part, but you can find documentation about this on Adobe’s website (pdf).

So, what we want to do is for each search result:

  • see if it is a PDF
  • if it is, pass the search phrase in the url

Um, I guess I’ll skip to the code:

XSL for PDF Hit Highlighting via URL

You can get this as a text file here.

So, what does this show? Firstly, you can see that the query string is being passed into our Search Core Results web part, as described previously. We’re also selecting the FileExtension column, also as described previously. Our XSL looks to see if the extension is PDF, and if so it emits the url to the item plus:

#search='query'

These bits are highlighted in yellow. Otherwise, we just output the link. (Note: this page doesn’t really use a lot of the neat features of your search results XML, like the hit-highlighted title).

So what happens then if a user clicks on this link? I searched for ‘Stupid Ferret’ in the example below…

PDF Hit Highlighting in Reader via URL

Tada – our search terms are searched for automatically when we open the PDF!

39 thoughts on “Hit Highlighting inside Adobe PDFs using SharePoint Search”

  1. Hi Andy,

    I don’t know if you are aware of BA-Insight’s Longitude, but it not only does hit-highlighting within PDF’s (or any other file type) it detects the most relevant pages in the document and presents those to the user without the need to download and open the document.

    Cheers!
    Martin

  2. This looks like good stuff, and I have verified that I can highlight by manually entering a query url. But where in the world do I put the xslt you’ve shown above?

    On the master page I have this, where the search box goes:

    Do I somehow need to associate this with a control? Is there something to do with the Core Results Page (which I am unable to locate !)? Does it require a blood sacrifice?

    I am using WSS 3.0, and I am pulling my hair out trying to get any kind of results.

    Thanks,
    Mark Edwards
    medwards@infassure.com

  3. Hi Mark,

    Ah, right, the XSLT goes in your results page. Go to the results page. Go Site Actions > Edit Page. Modify the Core Results Web Part. One of the options in the toolbar on the left is to modify the XSLT (it’s a button).

    Click the button and copy and change your XSLT!

    This post shows doing this:
    http://www.novolocus.com/2008/05/09/how-do-i-get-the-xml-of-my-search-results/

    The aim of the XSLT in that is to show the output of the Search Webservice.

    Regarding the master page, you’ll want to make sure that the search box on that points to the correct results page, though what that is depends upon your system; it’s hard for me to give instructions. Basically, all the search box does is forward you to the results page, and pass it your search term.

  4. Great post – helped me out in a pinch with a client, as we were having issues with KWizCom, their support, and, their product…

    Question – any thoughts on how to extend this to MS Office documents? I’ve done loads of searching, and have found nothing so far…

    Thanks!

  5. Hmm. Yes, I don’t know about hit highlighting within Office documents – is that even possible?

    The PDF highlighting was only possible because the Adobe Reader browser plugin would search based on a query string parameter, so essentially, it’s a feature of the client side application. I’ve never seen something like that in Office, I’m afraid!

    On the other hand, though, maybe you just mean hit highlighting in the search results page itself? If so, MOSS 2007 does this out of the box! This article shows you how to set up hit highlighting on document titles:

    http://www.novolocus.com/2008/05/19/hit-highlighting-in-sharepoint-search-document-titles/

    However, if you look at the XML you get back from the Search Web Service, you’ll see something like this:

    http://www.novolocus.com/wp-content/uploads/2008/05/hit-highlighting-properties.png

    Note that (although not highlighted) there is a Hit Highlighted Summary section, as we as Highlighted Title and URL.

    So, just edit your XSL for your Core Results Web Part to use them. Well, I say ‘just’ – it’s a bit complicated, but I’m sure you can figure it out.

  6. Andy – thanks for the response. I have hit highlighting working as needed, as well as the PDF highlighting thanks to your article. Office docs can be done – just not sure how.

    KWizCom, as I had mentioned, does do it (http://www.kwizcom.com/ProductPage.asp?ProductID=28&ProductSubNodeID=79) however, their product did not fit the needs that my client had, and they ended up breaking functionality in a release, and then eventually stopped responding to queries…

    I suspect the way they are opening PDFs via their utility is the same thing you are doing, they also can support other office docs as well. That code however is locked away in compiled code, so, I was hoping maybe you had an idea if there was any sort of query string voodoo that you knew of that did that.

    If I do come across anything that can do it, I’ll post back here to share!

  7. Hi andy i am a novice developer,had a problem statement in the same context.
    I am crawling pdf files using another search engine crawler,and when i click on a link on the search results page it has to guide me to the location where it is located on the pdf(highlighting also).
    I tried with the following link but it is opening the search UI but i have to manually enter the keyword again to go to the particular location (actually it has to automatically be redirected to that location)

    url entered:

    search

    Can you guide me to how this solutiion posted above is of use to this context.

  8. i already ry this soluion. it help me alot.
    but a face a problem.
    my problem is, not at all pc(user) can get this kind of result.
    some of user, when they click the result, the document automaticly open in acrobar reader and make the higlight fail to function.
    but at some user, which the document just open at the browser, the highliht function will success.
    is there any step or way to prevent the document to open at adobe reader and just open at browser.

  9. I’m stuck. here is my template from XSL

    {XSL Removed by WordPress}

    Do I need do add anthing else? It isn’t passing #s from pdf search results links

    Thanks for any advice.

  10. Ah! Got an idea. I think I remember having a problem with this that sometimes the file extension comes through capitalised, and other times it doesn’t. In XSL ‘.pdf’ != ‘.PDF’.

    I forget off the top of my head what I had to do to fix this – I’ll try and look it up sometime soon.

  11. * This search brings all the words which will match the query. but i need to highlight only the exact word or phase.

    * How can i get the same functionality in word document?

  12. Hmm. I don’t know, check the acrobat reader documentation. It might not be possible.

    As for Word, no idea – not sure if that’s possible. I suspect it isn’t. It’d be interesting to have a look though!

  13. I have implemented this XSL. Its working fine.

    Is there anyway to search a exact phrase or word in pdf?

    Because while searching a phrase, it search each word and gives more results.

    Can anyone say me how to implement exact phrase search?

  14. This hit highlighted search is not working for advance search share point
    Can anyone help to solve this?

  15. Is your advanced search page taking you to the standard Search Results page? Check the properties on your Advanced Search web part – or that the URL is to the correct results page after you’ve done an advanced search.

  16. Hi Andy,

    Yes my advanced search page is taking you to the standard Search Results pages. Is there is any way to solve this problem? can you guide me?

  17. Assuming that you’ve created a new search results page for your results with PDF hit highlighting, then…
    – go to the ‘home’ page of your search center,
    – edit it,
    – edit the properties of the web part on it,
    – and set the results page URL to the new results page with PDF highlighting.
    – Save and publish the page.

  18. Hi,

    I copied and pasted this into my results.xml and nothing happened! I had no errors but when I went to test it none of the query terms were highlighted in the PDF? I have no idea what I’m doing wrong – I’m also not an expert XSLT dev so that could be part of the problem :)

    I had to add query as a parameter ( ) so not sure if this is where I’m going wrong. Any help would be appreciated.

    Thanks.

  19. Hmm. It’s a tricky one. Try running the same search inside the PDF. What I’ve found is that the reader doesn’t always obey the query string parameters as well as we’d like.

    Did you modify the Core Results web part to bring back the file extension, as linked to above?

    Also, is the search term actually in the document? (We had a PDF that contained pages as non-OCR’d images, and the search term was metadata against the List Item itself. Thus, we could see our data, but not search for it within the document)

  20. My #search=’query’ is always coming empty. I followed the steps outlined above but it is coming empty

    My XSLT is as follows

     

    |

     

    |

    |
     

    |
     



    Bytes
    KB
    MB

    [

    ]

    function ToggleDefinitionSelection()
    {
    var selection = document.getElementById(“definitionSelection”);
    if (selection.style.display == “none”)
    {
    selection.style.display = “inline”;
    }
    else
    {
    selection.style.display = “none”;
    }
    }


  21. Hi all,

    i ‘m trying to implemant this post to an installation of SearchServer Express 2008, on a windows 2003 server.

    I have some problems to understand where i need to put this lines :
    this one never work:

    where put this ones :



     


     

    regard’s for your help with this.

    Fred

  22. edit previous post

    the line never work is :
    xsl:variable name=”q” select=”$query”

    where put the lines under the tr flag ?

    regards

  23. Hey there. I’m trying to get this implemented on our sharepoint site and am having a hard time. I keep getting an “unknown error” when trying to search after adding this xsl to the core results webpart on the osssearchresults.aspx page.

    I tried adding the code with the rest of the xsl templates, before them, after, them, etc…and also tried just replacing the existing….all ended up throwing the error. Obviously I’m doing something wrong but I just don’t know what. Is there something in your code that needs to be customized? Where exactly do I insert the code within the xsl area?

    I’m on wss 3.0 if that matters. Thanks!

  24. Hard to know without reviewing the code – which is a bit too like work! Is there anything in the logs?

    FYI, I find that “unknown error” is the devil when modifying search results…

  25. I opened the page in Sharepoint Designer and it was throwing this error:

    “a reference to variable or parameter ‘query’ cannot be found”

    once I added a parameter called query, the webpart did function but the icons still do not appear. there is something wrong with the code since the icons do not appear at all. Apparently neither if statement is true. I’ve tried in vain to edit anything and everything in your code and can’t get it to work. Any ideas?

    Thanks

  26. and also, it seems to have no idea what the variable $query even is as the string after “search=” is empty

  27. Hi,
    I realise this is probably just as easy to implement in 2010 however the markup is slightly different, I can find the location in the core results web part however the query varaible does not work and so I cant see a way of passing the query to the url. Can someone tell me how to assign the query to a variable so I can include it in the url?
    $query doesnt seem to work in 2010?
    Many thanks,

  28. Is hit higlighting possible in MS Office documents ? Appreciate if any one can provide input on the same.

  29. I’m pretty sure that the Search hit-highlighting does show the region around the hit in the search results. I’m not aware of any way to open the document up with the search hit’s shown.

Comments are closed.