Oren Eini

aka Ayende Rahien

Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,495

|

Comments: 51,046

Copyright ©️ Ayende Rahien 2004 — 2024

Privacy Policy · Terms

filter by tags archive

architecture (588) rss
bugs (440) rss
challanges (123) rss
community (372) rss
databases (468) rss
design (878) rss
development (604) rss
hibernating-practices (69) rss
miscellaneous (591) rss
performance (375) rss
programming (1049) rss
raven (1389) rss
ravendb.net (470) rss
reviews (183) rss
uberprof (41) rss

2024
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Apr 10 2014

Sorting with Lucene

time to read 2 min | 352 words

I talked about the Lucene formula and how it calculate things using tf-idf for best matches. Now I want to talk about the actual sorting implementation. As it turned out, the default sorting (by relevancy) is really simple. All you need is to get the relevant score for a query, then you shove the results through a heap with a specified size. The heap will take care of maintain the top results.

So far, that is pretty simple to understand. But the question is, how do you do sorting on a field value? The answer is, not easily.

GetStringIndex() does something very interesting. I returns a string index, which gives us:

A string array containing all the distinct (sorted) value for this index.
A int array with all the documents in the index, with the position of the value of that field in the string value array

Now we can compare fields by their field position on the field, which give us pretty good sorting. Unfortunately, this also require us to load all the values to memory. Let us see another example, which would probably be easier to follow:

Sorting by an integer is done like this:

Get an array (whose size match the number of documents), We can then sort things easily because accessing the relevant field value only require us to have the document id to index into the array.

The reason Lucene does this is that it uses an inverted index, and it has no easy way of going from the field values to the list documents it has. So it is easier to read all the values into memory and work with them there. I don’t like it, but off hand, I can’t think of a better way to handle this.

Tweet Share Share 0 comments

Tags:

reviews

Comments

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

Recording (13):
05 Mar 2024 - Technology & Friends - Oren Eini on the Corax Search Engine
Meta Blog (2):
23 Jan 2024 - I'm a JS Developer now
Production postmortem (51):
12 Dec 2023 - The Spawn of Denial of Service
Challenge (74):
13 Oct 2023 - Fastest node selection metastable error state–answer
Filtering negative numbers, fast (4):
15 Sep 2023 - Beating memcpy()

View all series

RECENT COMMENTS

Scott, Ouch, that is nasty. This gets _really_ complex if you have any non trivial billing setup. For example, for large...

By Oren Eini on Permission to pay your money, please!
You'd be surprised. And this isn't the worst of it. We are a 100% SAAS-architected shop, I have close to a hundred SAAS servi...

By scott hall on Permission to pay your money, please!
Yeah I agree it's a longshot. Just trying to imagine what they could be worried about.

By Jag on Permission to pay your money, please!
Jag, Chargeback is a concern, but that seems like a far off scenario. And the original customer is still _expecting_ to p...

By Oren Eini on Permission to pay your money, please!
Maybe an attack vector could be paying for someone elses service, then doing a chargeback later, causing the account to be cl...

By Jag on Permission to pay your money, please!

Syndication

Main feed
Comments feed

}