AnswerIQ Technology

Make Docker images Smaller with This Trick

The architectural and organizational/process advantages of containerization (eg., via Docker) are commonly known. However, in constructing images, especially those that serve as the base for other images, adding functionality via package installation is a double edged sword. On one hand we want our images to be most useful for the purposes they are built but—as images are downloaded, moved around our networks and live in our production environments—we pay a real speed and cost price for bloated image sizes. The obvious onus on image creators is to make them as practically small as possible without sacrificing efficicacy and extensibility. This blog shows how we shrunk our images with a pretty simple trick...

Read More

Topics: Machine Learning, Data Science, Software Engineering

ParaText: CSV parsing at 2.5 GB per second

Despite extensive use of distributed databases and filesystems in data-driven workflows, there remains a persistent need to rapidly read text files on single machines. Surprisingly, most modern text file readers fail to take advantage of multi-core architectures, leaving much of the I/O bandwidth unused on high performance storage systems. Introduced here, ParaText, reads text files in parallel on a single multi-core machine to consume more of that bandwidth. The alpha release includes a parallel Comma Separated Values (CSV) reader with Python bindings.

Read More

Topics: Machine Learning, Data Science, Software Engineering