jeudi 29 octobre 2015

Best way to chunk a large string by line

I have a large file (400+ MB) that I'm reading in from S3 using get_contents_as_string(), which means that I end up with the entire file in memory as a string. I'm running several other memory-intensive operations in parallel, so I need a memory-efficient way of splitting the resulting string into chunks by line number. Is split() efficient enough? Or is something like re.finditer() a better way to go?

Aucun commentaire:

Enregistrer un commentaire