Introduction to Solr

Apache Solr

Apache Solr is a free and open source search engine.  It provides ultra-fast search against structured, semi-structured and unstructured data.  Its cloud-enabled mode allows for massive search indexes scaled and replicated on a Hadoop cluster.  It forms the search backbone used by companies such as Best Buy, Sears, eHarmony and more. In fact, it’s used by 90% of Fortune 500 companies.

Now that you know what Solr is, you might question how a search engine would be used in a BI infrastructure, but the very things that make it an excellent search engine also make it a potent data store for analytic use.  After all, search engines are nothing more than specialized databases.  In this article, we’ll outline a few BI applications of Solr.


Scenario 1: Text Analytics with HR

Hiring managers often have to go through piles of resumes just to find a dozen or so to interview.  If you’re in HR, you’re going through resumes ad nauseam.  Whereas traditional databases have limited text processing capabilities, Solr is much better suited to analyze and filter resumes submitted for job openings.

It’s a natural use case for Solr. It can be fed native documents, including PDF, Word, XML or plain text and integrate those into its index.  With its development as a search engine, it can easily process the unstructured text.  It can extract key words and phrases, perform language detection and transparently deal with differing word forms.

After a hire, periodic reviews can be combined with keywords, key phrases and other metadata extracted from the source resume to form a predictive model, which can then be used in later hiring processes.

Scenario 2: Spatial Analytics with Strategic Planning

When a store chain grows from a local to a regional endeavor, new locations better serve existing customers and attract new ones.  A strategic planner has to make the age old decision of a business storefront: location.  Here, too, Solr has specialized features that can help.  Its geospatial features allow the strategic planner to plot existing and potential customers on a map, and easily incorporate distance into the ranking of each potential location.

Likewise, when visualizing customer purchases, its grouping features can quickly break down customers by distance traveled, amount purchased, or number of visits.

Scenario 3: Log file Analytics with Manufacturing

Manufacturing operations track parts assembly as they enter the inventory until they leave the line fully assembled.  All the machines on the assembly line record log entries.  They might post entries with different structures.  That line might be one of dozens or hundreds.  With that volume, you need very efficient and scalable ingestion and search.  Solr can operate in SolrCloud mode, scaling to nearly infinite volume.  Combined with an ingestion tool like Apache Nifi, Solr can index extremely high volumes of data.

Because it is first a text processing engine, it can deal with vagaries of structure, searching the general text or extracting them into appropriate structures as the entries are indexed.  It can power responsive dashboards showing production rate, defect rate, etc.  They can be filtered by date range, batch, product line, location or even by keyword. Solr can usually handle these filters in near real-time.

Will Solr 6 Provide Analytics for Anything?

Solr 6, due in the first half of 2016, introduces a new SQL query engine.  That allows it to be a data source for, well, just about anything.  SQL opens up a new world of complex queries AND makes it available to a much broader audience already familiar with SQL.  It might just be a new day for Solr.

If you have any additional questions, please reach out to us on our Contact Us page.