
The search engine that powers both CBC.ca and radio-canada.ca is a Google Search Appliance (GSA). As such you can use most of the tools and search terms on the commercial site on the CBC.ca search engine.
Query Hacking
Let’s start off with some basics. The search engine by default “OR”s your search terms. That means that if you type in “blue house” (without the quotes) the search engine will return hits that have the word “blue” or “house”. If you would like to force the search engine to “AND” your results then you need to wrap your search term in quotations. This will result in only the exact phrase being matched and returned in your results.
If you would like to omit certain terms from your results, you can prefix them with the negative sign (”-”). If you are looking for information on bass fish (and not related to music at all) you can remove all references to “music” by using “bass -music” (without the quotes) as your search query.
You can restrict your search results to a specific file type by using the “filetype:” query prefix. If you are looking for a specific PDF on our site you can use “filetype:pdf” in your query. “budget filetype:pdf” (Without the quotes) will return all of the pdfs that have the word “budget” in them, on our site. The GSA can index over 100 types of files including binary files such as jpgs, tiffs, psds, and flash content. However, we do not crawl and index these types of files. Only “text” content is indexed (this includes, pdf, doc, xls).
The “site:” query prefix allows you to restrict your query to a specific section of the site. Some of you might remember a while ago that we used to allow the user to restrict their search query to news,sports, or arts, by using the tabs at the top of the search results. That has been gone for a while, but you can still achieve the same thing. For example: You want to find sports stories about Tod Bertuzzi and not stories about his home life in BC. You can use the following query to return only sports stories “Bertuzzi site:www.cbc.ca/sports”. This will return 679 instead of 1380 results (if you did not restrict your search to Sports stories). On the other hand if you want to only view stories from BC you would do: “Bertuzzi site:www.cbc.ca/canada/british-columbia”
The “site:” query prefix is handy for searching our newsletters. If you want to search the Quirks and Quarks newsletter for “cats” you can use “cats site:interact.cbc.ca/pipermail/quirks” (without the quotes). If you want to search all of our newsletters amend the “site:” prefix to only read “site:interact.cbc.ca/pipermail”
It is important to not put on a trailing slash when you are using the site: query prefix unless of course you want to restrict your search results to that *exact* url and not be recursive.
URL Hacking
Now on to the more advanced stuff. This can only be done by hacking the URL directly. Once you have your search results page you can further change the results by adding items to the results url.
If you would like to get your search results in XML, you need to change two parameters:
1. Remove the proxystylesheet key (proxystylesheet=CBC)
2. Change the “output” key from output=xml_no_dtd to output=xml
For Fun, you can also get the search results using Radio-Canada’s template:
1. Change the proxystylesheet from proxystylesheet=CBC to proxystylesheet=RadioCanada
There are two options for sorting your search results. “By Date” just sorts the results, regardless of relevancy, in chronological order. “By Relevance” orders the results in an order that the GSA will think is the most relevant. You can combine the two, that is sorting by date and relevancy, by editing the sort key:
1. Change the sort key to sort=date:D:S:d1
If you are looking for French content on CBC.ca (and not radio-canada.ca) you can add the “lr” key to the url:
1. Tack on &lr=lang_fr to your search page url.
Ultimately, if you are looking for English pages on Radio-Canada’s site, you can add “&lr=lang_en” to their search results page url.
Things We’re Working On.
Here are some of the “neat” things we’re working on:
Current weather conditions: You can get the current weather conditions in the search results by suffixing your query with “weather”. For example, you can get Toronto’s latest weather conditions by typing in “toronto weather” as your search query. Right now this is only works for a select number of cities.
Latest News: If you use a query that is contained in one of today’s news stories, you will see a link to that news story at the very top of your results highlighted in a blue background.
Feel free to pose any questions about our search engine or suggest any features you’d like to see in a comment. If you would like more technical detail on how to hack our search results you can find them on the Google API page.
|
|
Email This Post |
| CBC.ca web site, Under the Hood |



















