links for 2009-05-02
Published May 3rd, 2009 in del.icio.us-
"Here are timings for a single counting process: iterate over 45,000 short text messages, tokenize them, then increment counters for their unigrams and bigrams. (The speed of the data store is only one component of performance.) There are about 17 increments per tweet: 400k unique terms and 750k total count. This is substantially smaller than what I need, but it’s small enough to easily test. I used several very different architectures and packages, explained below."
No Comments to “links for 2009-05-02”
Please Wait
Leave a Reply