Deepening the Conversation

thinking about questions of authority, technology, learning, and 2.0 in academic libraries

First Thoughts on Federated Search

5 Comments

We are at early days of discussing federated search at MPOW and I am very leery of the approach we are taking, which seems to be aimed at the predetermined end that we will get a federated product, we just have to decide which features we want.

As is my nature, I want us to discuss whether or not we actually want federated search, and my colleagues have addressed that by asking me to compile a list of my concerns. Federated is something I followed very closely before coming here, but I am admittedly behind the curve on new innovations and improvements in the technology. I have a stack of articles to read up on, and I am sure they will change my concerns. But, to track my own thinking process (publicly) I thought I’d put my initial concerns up here. All feedback is welcome!

  1. Subject databases don’t index their primary subject. A sloppy search in a federated context (and let’s admit up front that there will be lots and lots of sloppy searches) will leave out the most relevant hits – those un-indexed as primary subject
  2. Information literacy requires an ability to select the right tool for the job. Federating assumes the opposite to be true. Format matters (because content is frequently format driven), and if the federated product includes OPAC, newspaper searches, statistical sources and article databases in the search, we are putting the librarian stamp of approval on the assertion that format doesn’t matter.
  3. A single Google box will inspire Google-like searching, which patently does not work with paid, indexed, library resources
  4. The intricacies of library search are not just there for decoration; the indexing and special limiters in each database are there because they serve the purpose of allowing searchers to get better results. Federated search removes many (if not most) of the special features that (a) improve precision and recall in search processes and (b) often drive collection development decisions
  5. I am cautious about how federated effects precision and recall. More better results is the goal, not just more.
  6. My final initial concern is one I have been told is no longer valid: federated search used to pull results in in whatever default order the source databases sorted. For example, and OPAC search would come back with most recent first, and and EBSCO search would come back with relevancy ranking, and some other database would come back with oldest first, and these results would get all mixed up together in a hodge-podge. Is fixed now? In all federated tools?

I’m off to read my articles and refine my thoughts. While I’m doing that, please contribute: what do you like or hate about federated/ What would you look for if you were looking for one? And, if you’re brave, what’s the decision making process like at your POW?

5 thoughts on “First Thoughts on Federated Search

  1. Re: 1, 3, & 6

    1. Federated searches only do sloppy searches of external indexes. The lowest common denominator (lcd) is used. I’ve been told otherwise, but my test searching in my FedSearch product shows lcd results for each database configured.

    3. Google-like searches will not work on dispersed indexes. A true federated search needs centralized indexes for better searching. I would much prefer to be able to NCIP the indexing data, insert it into my own app, and crunch the search myself — providing referring links back to the vendor for content. Faster, easier, *cheaper, by far, for the content vendor,* and a better experience for my users.

    6. Arguably, that particular “feature” is fixed. With that said, my experience is that when I want to automagically concatenate all the results from a particular search across databases into a de-duped single list *only the first 10 results from each database are included in this list.* I’m sure that number is configurable these days, but a local index would do it faster, better, and without the arbitrary results cap.

    Corrections to my possible misperceptions are welcome. These are only my grumpy deductions from observed phenomena.

  2. We implemented federated search about two weeks before fall semester. Students love it (especially freshmen), but I hate it. You are right to have concerns re precision and recall–it is very slanted toward the recall side of things. In fact all of your concerns are right on. As for #6, results are now sorted by ‘relevance’ but as you point out in #1, subject indexes don’t index by subject and this throws things off with the relevance, I think. I bypass the federated search as much as possible and encourage students to do the same. It’s just another case of lowest common denominator thinking in my opinion–now students don’t have to learn how to research in order to get articles! Who cares that the articles they get are mostly crap?

    As for the decision making process (and this is why I put my name as Anon.), we are currently not involved in it for the most part. We have a new dean and she just does things. If we’re lucky she tells us about them before they go live. We had about a week’s notice with the federated search. And since it went live two weeks before the start of the semester we had little time to work out any bugs or become familiar with the interface ourselves. There’s been no request for our feedback on this or many other decisions, and if we do give feedback it is usually disregarded or belittled. So I guess you can be glad that at least you are having discussions, even if they have a predetermined assumption.

  3. We got rid of our federated search because….well, it sucked. It pulled in the first 100 results from each database we chose for the different categories (general resources, sociology, religion, education, etc.), and as you know databases have a different default set up, like EBSCO defaulting at newest first and JSTOR with their relevancy rankings. Anyway, it pulled the first 100 results from each database, and then did some sort of ranking of its own (a bad ranking, in my opinion) based upon those results and then those composed the first few pages of results. After those first 100 results from all the databases getting mixed around like that, then the next pages of results consisted of the rest of the results from all the databases in some random order and it’s just not good.

    The thing is, students rarely go past the first page of results, so they’re maybe not even getting to what is really relevant for their topic. And, we teach students that if you get a lot of results when you do searches, you need to re-do your search terms and re-do your search to find a more narrow, focused set of results. What does it say to them when we then have a federated search that retrieves them thousands and thousands of results?

    And, you are completely right, it does not help with information literacy where we try to get them to independently think about what sources to use based upon what they want.

    Giving them what they want in terms of research isn’t the best all the time. And, just because they are the generation growing up with Google still doesn’t mean they know how to correctly even use Google. They don’t know complicated searches, that are sometimes needed for in-depth research in library databases, so breaking down research to make it seem like it’s as easy as Google is silly.

    So, yes, we got rid of our federated search because it sucked…and we got no complaints that it was gone.

  4. LeAnn,
    You actually *got rid of* your federated search? That’s pretty amazing!

    Do you mind sharing what Federated tool, and when/what version? (just want a fully stocked toolkit of armaments!)

  5. Hey, Rudy,

    We had MultiSearch through CSA: http://www.csa.com/e_products/MS_main.php, though I’m not sure of the exact version (or even if there are multiple ones?). I tried calling a colleague of mine in our other library, our electronic resources librarian, but he must not be around. I can find out for you later on that one.

    Anyway, there are multiple reasons we all didn’t like it, some like I said before, but I thought of another reason. When students would click “Search,” either from the little search box we had on the library homepage or in MultiSearch directly, MultiSearch had the stupidest way of stating that the person had Zero results while MultiSearch was fanning out to actually get the results. It didn’t state that the results were pending or anything, but while it was going to get the results it would just say that you found zero, so instead of waiting for more results to appear, students would click back to the search thinking they had to redo their search, not knowing that it just took a bit for the results to actually appear. So, maybe this has been changed now, who knows, but students weren’t sticking around to use it because it wasn’t as quick at retrieving results as just going directly into a database. It wasn’t extremely slow, but just not as fast as directly using a certain database, and you know how most students want the results ASAP.

Leave a comment