I have a large corpus of documents I’d like to run using Apache Spark against a Lucene index which is prebuilt.

What I’d like to do is load the Lucene index on each worker and then process each partition against it and use it to derive if each document contains a feature (which is stored in Lucene).

Is this possible?

Source link

No tags for this post.


Please enter your comment!
Please enter your name here