Edward Capriolo

Monday Mar 19, 2012

More Taco Bell Programming with Solandra

I was super excited that my blog has been getting picked up on places like planetcassandra.com and nosql.mypopescu.com. But then yesterday I realized that writing a bunch of cruddy half joking dev-ops blogs is sure to land me back in obscurity so I decided to write something serious.

First some back story. In the beginning of my hadoop career, I remember very clearly being attracted to Hadoop by one very interesting use cases Hadoop Rackspace . This attracted me for two reasons: The first was I had just done some custom distributed lucene work. The second reason was the first application I every wrote at my first job was a Java application that wrote syslog messages to mysql with a front end to search them.

Because I wrote a syslog -> mysql application I was never impressed by splunk. The fact that its commercial license kept timing out features and trying to charge me cemented my dislike.  

Over the years I have written several programs that read and write logs thanks to hadoop, but I always wanted to take a shot at writing that awesome near real time full text search system. Finally I slotted a weekend and went for it.

I knew Jake had coded Solandra a while ago. https://github.com/tjake/Solandra . I never really sat down and realized how awesome of a job it was. I mean back ending of Solr into Cassandra. Fricken amazing.  

Someone who write a small review on their blog about my IronCount work called it "taco bell programming". I never heard that term before and I have to say I love the concept. I love just taking some basic ingredients and whipping something together. You can call me a taco bell programmer any day! Looking at Solandra and the Reuters demo my taco bell mind kicked in.

Ed's Mind: "Dude. You can take Solandra and this ajax thing and bang out a Taco Bell version of the rackspace thing in no time"

 schema.xml
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="log" version="1.1">
 <types>
  <fieldType name="string" class="solr.StrField"/>
  <fieldType name="date" class="solr.DateField" sortMissingLast="true" omitNorms="true"/>  

  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitO
nCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitO
nCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

  </types>

  <fields>
    <field name="id" type="string" indexed="true" stored="true" />
    <field name="user" type="string" indexed="true" stored="true" />
    <field name="date" type="date" indexed="true" stored="true"/>
    <field name="message" type="text" indexed="true" stored="true" termPositions="true"/>
    <field name="application" type="string" indexed="true" stored="true" termPositions="true"/>
    <field name="instance" type="string" indexed="true" stored="true" termPositions="true"/>
    <field name="fqdn" type="string" indexed="true" stored="true" termPositions="true"/>
    <field name="level" type="string" indexed="true" stored="true" termPositions="true"/>

    <field name="allText" type="text" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" />
  </fields>

  <copyField source="user" dest="allText"/>
  <copyField source="date" dest="allText"/>
  <copyField source="message" dest="allText"/>
  <copyField source="application" dest="allText"/>
  <copyField source="instance" dest="allText"/>
  <copyField source="fqdn" dest="allText"/>
  <copyFiled source="level" dest="allText"/>

  <defaultSearchField>message</defaultSearchField>
  <uniqueKey>id</uniqueKey>

  <solrQueryParser defaultOperator="OR"/>

</schema>

I am starting to love me some groovy. I should have written a log4j appender and eventually I will have to but instead I decided to taco-bell-express a logger. i love groovy's GRAB annotation. I used to not be a fan of this type of stuff (pulling jars from the net) but I guess I am a convert now. 

[root@tablitha groovy]# more core/Solr.groovy 
package core;

import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.UUID;
import java.util.concurrent.atomic.AtomicLong;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.common.SolrInputDocument;

@Grab(group='org.apache.solr', module='solr-solrj', version='1.4.1')
@Grab(group='org.slf4j', module='slf4j-nop', version='1.6.2')


public class Solr {

public CommonsHttpSolrServer server;
public AtomicLong counter;
public UUID myId;
public String user;
//message filled in by user
//date filled in by user
public String application;
public String instance;
public String fqdn;
//level filled in by user

public Solr () {
myId=UUID.randomUUID();
counter= new AtomicLong(0);
fqdn=System.getenv("HOSTNAME");
user=System.getenv("USER");
}

public void connect(String url){
server = new CommonsHttpSolrServer( url );
server.setSoTimeout(1000); // socket read timeout
server.setConnectionTimeout(100);
server.setDefaultMaxConnectionsPerHost(100);
server.setMaxTotalConnections(100);
server.setFollowRedirects(false); // defaults to false
server.setMaxRetries(1); // defaults to 0. > 1 not recommended.
server.setParser(new XMLResponseParser());
}

public void logIt(Object level, Object message){
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField("id", ""+myId+"_"+counter.getAndAdd(1));
doc1.addField("fqdn", fqdn);
doc1.addField("message", message.toString());
doc1.addField("user", user);
DateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:sss'Z'");
doc1.addField("date", sdf.format(new Date()));
doc1.addField("level", level.toString());
if (application != null){
doc1.addField("application", application);
}
if (instance != null){
doc1.addField("instance", instance);
}
Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
docs.add(doc1);
//there is a way to add and commit in one call... find that
if (server!=null){
server.add(docs);
server.commit();
}
}

}
 


 

It took me a little bit of jerking around with jquery which I had never worked with before, but I was able to get the tagclouds, auto-complete and pagination support working against my new schema. (Still screwing with the date picker widget), but I am really happy. 

Why?

1. Real time searchable logs

2. Scalability with solandra

3. The apps do not know it is not solr, (no need to custom drivers etc)

4. Results at taco bell speed 

  

Comments:

I never heard that word before and I bed to say I couple the construct. I love fair taking several goods ingredients and combat something unitedly cute love quotes.

Posted by julian on June 04, 2012 at 05:15 AM EDT #

Post a Comment:
Comments are closed for this entry.

Calendar

Feeds

Search

Links

Navigation