jeudi 5 novembre 2015

Clojure lazily read random line from file

I have a sample data set in a txt file. The data file is extremely large so loading it in memory is not an option. I need to be able to read the file lazily. Furthermore, I need the lines to be read in a random order. And there might be cases where I don't need to read all the lines. This is what I found so far -

(defn read-lazy [in-file]
        (letfn [(helper [rdr]
                            (if-let [line (.readLine rdr)]
                                (cons line (helper rdr))
                                (do (.close rdr) nil)))]
            (helper (io/reader in-file))))

which returns a lazy-seq of the file. How can I loop through random lines in the lazy-seq when I need to? I think using a go block could help here. Go blocks could put a random line in a channel and await for something to consume it. Once the data gets read it puts another line in the channel awaits for the next read. How can I implement that?

Here's how I've worked it out (not random) -

(def lazy-ch (chan))
(defn async-fetch-set [in-file]
        (with-open [reader (io/reader in-file)]
            (doseq [line (line-seq reader)]
                (>! lazy-ch line)))
        (close! lazy-ch)))

(println "got: " (<!! lazy-ch))

Is this a good way to approach the problem? Is there a better solution? I might not need to read all the lines so I'd like to be able to close the reader if whenever I need to.

