So, I’ve posting about some work I’ve been doing with SharePoint Search. Now, to pull it all together - I’ve been trying to do hit highlighting inside Adobe Reader, so if a user clicks on a search result Adobe Reader opens and automatically searches for occurances of the search term. Doing this requires knowing your file extension (so you only process PDFs this way), knowing the query string parameter for what is being searched for, and that you can pass parameters into Acrobat Reader. I’m not going to go too much into that last part, but you can find documentation about this on Adobe’s website (pdf).
So, what we want to do is for each search result:
- see if it is a PDF
- if it is, pass the search phrase in the url
Um, I guess I’ll skip to the code:
You can get this as a text file here.
So, what does this show? Firstly, you can see that the query string is being passed into our Search Core Results web part, as described previously. We’re also selecting the FileExtension column, also as described previously. Our XSL looks to see if the extension is PDF, and if so it emits the url to the item plus:
#search='query‘
These bits are highlighted in yellow. Otherwise, we just output the link. (Note: this page doesn’t really use a lot of the neat features of your search results XML, like the hit-highlighted title).
So what happens then if a user clicks on this link? I searched for ‘Stupid Ferret’ in the example below…
Tada - our search terms are searched for automatically when we open the PDF!


Hi Andy,
I don’t know if you are aware of BA-Insight’s Longitude, but it not only does hit-highlighting within PDF’s (or any other file type) it detects the most relevant pages in the document and presents those to the user without the need to download and open the document.
Cheers!
Martin
This looks like good stuff, and I have verified that I can highlight by manually entering a query url. But where in the world do I put the xslt you’ve shown above?
On the master page I have this, where the search box goes:
Do I somehow need to associate this with a control? Is there something to do with the Core Results Page (which I am unable to locate !)? Does it require a blood sacrifice?
I am using WSS 3.0, and I am pulling my hair out trying to get any kind of results.
Thanks,
Mark Edwards
medwards@infassure.com
Hi Mark,
Ah, right, the XSLT goes in your results page. Go to the results page. Go Site Actions > Edit Page. Modify the Core Results Web Part. One of the options in the toolbar on the left is to modify the XSLT (it’s a button).
Click the button and copy and change your XSLT!
This post shows doing this:
http://www.novolocus.com/2008/05/09/how-do-i-get-the-xml-of-my-search-results/
The aim of the XSLT in that is to show the output of the Search Webservice.
Regarding the master page, you’ll want to make sure that the search box on that points to the correct results page, though what that is depends upon your system; it’s hard for me to give instructions. Basically, all the search box does is forward you to the results page, and pass it your search term.
Great post - helped me out in a pinch with a client, as we were having issues with KWizCom, their support, and, their product…
Question - any thoughts on how to extend this to MS Office documents? I’ve done loads of searching, and have found nothing so far…
Thanks!
Hmm. Yes, I don’t know about hit highlighting within Office documents - is that even possible?
The PDF highlighting was only possible because the Adobe Reader browser plugin would search based on a query string parameter, so essentially, it’s a feature of the client side application. I’ve never seen something like that in Office, I’m afraid!
On the other hand, though, maybe you just mean hit highlighting in the search results page itself? If so, MOSS 2007 does this out of the box! This article shows you how to set up hit highlighting on document titles:
http://www.novolocus.com/2008/05/19/hit-highlighting-in-sharepoint-search-document-titles/
However, if you look at the XML you get back from the Search Web Service, you’ll see something like this:
http://www.novolocus.com/wp-content/uploads/2008/05/hit-highlighting-properties.png
Note that (although not highlighted) there is a Hit Highlighted Summary section, as we as Highlighted Title and URL.
So, just edit your XSL for your Core Results Web Part to use them. Well, I say ‘just’ - it’s a bit complicated, but I’m sure you can figure it out.
Andy - thanks for the response. I have hit highlighting working as needed, as well as the PDF highlighting thanks to your article. Office docs can be done - just not sure how.
KWizCom, as I had mentioned, does do it (http://www.kwizcom.com/ProductPage.asp?ProductID=28&ProductSubNodeID=79) however, their product did not fit the needs that my client had, and they ended up breaking functionality in a release, and then eventually stopped responding to queries…
I suspect the way they are opening PDFs via their utility is the same thing you are doing, they also can support other office docs as well. That code however is locked away in compiled code, so, I was hoping maybe you had an idea if there was any sort of query string voodoo that you knew of that did that.
If I do come across anything that can do it, I’ll post back here to share!
Hi andy i am a novice developer,had a problem statement in the same context.
I am crawling pdf files using another search engine crawler,and when i click on a link on the search results page it has to guide me to the location where it is located on the pdf(highlighting also).
I tried with the following link but it is opening the search UI but i have to manually enter the keyword again to go to the particular location (actually it has to automatically be redirected to that location)
url entered:
search
Can you guide me to how this solutiion posted above is of use to this context.
Hmm. Your link came through empty. Try just typing in the URL - e.g.
http://someserver/restofurl
The important bit is making sure the URL end in #search=’query’
i already ry this soluion. it help me alot.
but a face a problem.
my problem is, not at all pc(user) can get this kind of result.
some of user, when they click the result, the document automaticly open in acrobar reader and make the higlight fail to function.
but at some user, which the document just open at the browser, the highliht function will success.
is there any step or way to prevent the document to open at adobe reader and just open at browser.