Data Library

Zulia Data Library

The zulia-data module is a standalone Java library for reading and writing CSV, TSV, Excel, and JSON data. It provides a unified API with format auto-detection, configurable type handling, and streaming support for large files.

Gradle

repositories {
    mavenCentral()
    maven {
        url "https://maven.ascend-tech.us/repo/"
    }
}

dependencies {
    implementation 'io.zulia:zulia-data:4.10.0'
}

Maven

<dependency>
    <groupId>io.zulia</groupId>
    <artifactId>zulia-data</artifactId>
    <version>4.10.0</version>
</dependency>

Reading Data (Sources)

Auto-Detect Format

The SpreadsheetSourceFactory auto-detects the format from the file extension and creates the appropriate source.

// Read with headers (CSV, TSV, or Excel based on extension)
try (SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithHeaders("/data/input.csv")) {
    for (SpreadsheetRecord record : source) {
        String name = record.getString("name");
        Integer age = record.getInt("age");
    }
}

Header Options

// Standard headers - allows duplicate and blank headers (renames duplicates with _2, _3 suffix)
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithHeaders(filePath);

// Strict headers - no duplicate or blank headers allowed
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithStrictHeaders(filePath);

// No headers - access by column index only
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithoutHeaders(filePath);

From Input Streams

// From a DataInputStream
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromStreamWithHeaders(dataInputStream);

// From a raw InputStream with metadata
DataStreamMeta meta = DataStreamMeta.fromFileName("data.xlsx");
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromSingleUseStreamWithHeaders(inputStream, meta);

Record Access

All record types support both name-based and index-based field access.

for (SpreadsheetRecord record : source) {
    // By column name (requires headers)
    String title = record.getString("title");
    Integer year = record.getInt("pubYear");
    Double score = record.getDouble("score");
    Boolean active = record.getBoolean("active");
    Date created = record.getDate("created");

    // By column index (0-based)
    String first = record.getString(0);
    Integer second = record.getInt(1);

    // With default values
    int count = record.getInt("count", 0);

    // Lists from delimited cell values (default delimiter: ;)
    List<String> tags = record.getList("tags", String.class);
    List<Integer> ids = record.getList("ids", Integer.class);

    // Full row as string array
    String[] row = record.getRow();

    // Available headers
    SequencedSet<String> headers = record.getHeaders();
}

CSV Source

FileDataInputStream input = FileDataInputStream.from("/data/test.csv");
CSVSourceConfig config = CSVSourceConfig.from(input)
    .withHeaders()
    .withDelimiter(',')
    .withListDelimiter(';');

try (CSVSource source = CSVSource.withConfig(config)) {
    for (CSVRecord record : source) {
        String value = record.getString("column");
    }
}

CSV Options

Option	Default	Description
`withHeaders()`	no headers	Read first row as headers
`withStrictHeaders()`	no headers	Headers with no duplicates or blanks allowed
`withDelimiter(char)`	`,`	Field delimiter
`withListDelimiter(char)`	`;`	Delimiter for list values within a cell
`withBooleanParser(Function)`	true/t/yes/y/1	Custom boolean parsing
`withDateParser(Function)`	ISO_DATE_TIME	Custom date parsing

TSV Source

TSVSourceConfig config = TSVSourceConfig.from(input).withHeaders();
try (TSVSource source = TSVSource.withConfig(config)) {
    for (TSVRecord record : source) {
        String value = record.getString("column");
    }
}

Same options as CSV except the delimiter is fixed to tab.

Excel Source

FileDataInputStream input = FileDataInputStream.from("/data/test.xlsx");
ExcelSourceConfig config = ExcelSourceConfig.from(input).withHeaders();

try (ExcelSource source = ExcelSource.withConfig(config)) {
    for (ExcelRecord record : source) {
        String value = record.getString("column");

        // Access native POI objects if needed
        Row nativeRow = record.getNativeRow();
        Cell cell = record.getCell("column");
    }
}

Excel Options

Option	Default	Description
`withHeaders()`	no headers	Read first row as headers
`withStrictHeaders()`	no headers	Headers with no duplicates or blanks allowed
`withListDelimiter(char)`	`;`	Delimiter for list values within a cell
`withExcelCellHandler(handler)`	DefaultExcelCellHandler	Custom cell type conversion
`setOpenHandling(handling)`	FIRST_SHEET	FIRST_SHEET or ACTIVE_SHEET

Multiple Sheets

ExcelSourceConfig config = ExcelSourceConfig.from(input).withHeaders();
try (ExcelSource source = ExcelSource.withConfig(config)) {
    // Read first sheet
    for (ExcelRecord record : source) { ... }

    // Switch to another sheet by index or name
    source.switchSheet(1);
    // or
    source.switchSheet("Sheet2");

    for (ExcelRecord record : source) { ... }

    // Query sheet info
    int count = config.getNumberOfSheets();
    String name = config.getActiveSheetName();
}

JSON Array Source

Reads a JSON array of objects [{...}, {...}].

JsonArraySourceConfig config = JsonArraySourceConfig.from(dataInputStream);
try (JsonArraySource source = JsonArraySource.withConfig(config)) {
    for (JsonSourceRecord record : source) {
        String name = record.getString("name");
        List<String> tags = record.getList("tags", String.class);
    }
}

JSON Lines Source (JSONL)

Reads one JSON object per line.

JsonLineSourceConfig config = JsonLineSourceConfig.from(dataInputStream);
try (JsonLineDataSource source = JsonLineDataSource.withConfig(config)) {
    for (JsonSourceRecord record : source) {
        String name = record.getString("name");
    }
}

Error Handling

// Log and skip bad lines instead of throwing
JsonLineSourceConfig config = JsonLineSourceConfig.from(dataInputStream)
    .withExceptionHandler(new LoggingJsonLineParseExceptionHandler(log));

Writing Data (Targets)

Auto-Detect Format

The SpreadsheetTargetFactory auto-detects the output format from the file extension.

List<String> headers = List.of("name", "age", "active");

// Auto-detect format from extension (CSV, TSV, or XLSX)
try (SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFileWithHeaders("/data/out.csv", headers)) {
    target.appendValue("John");
    target.appendValue(30);
    target.appendValue(true);
    target.finishRow();

    target.appendValue("Jane");
    target.appendValue(25);
    target.appendValue(false);
    target.finishRow();
}

Without Headers

try (SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFile("/data/out.csv")) {
    target.writeRow("John", 30, true);
    target.writeRow("Jane", 25, false);
}

Custom Configuration

SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFileWithHeaders(
    "/data/out.csv", true, headers,
    config -> config.withListDelimiter('|')
);

CSV Target

CSVTargetConfig config = CSVTargetConfig.from(FileDataOutputStream.from("/data/out.csv", true))
    .withHeaders(List.of("name", "score", "date"))
    .withDelimiter(',')
    .withListDelimiter(';');

try (CSVTarget target = CSVTarget.withConfig(config)) {
    target.appendValue("John");
    target.appendValue(3.14);
    target.appendValue(new Date());
    target.finishRow();
}

// Or with defaults
try (CSVTarget target = CSVTarget.withDefaultsFromFile("/data/out.csv", true, headers)) {
    target.writeRow("John", 3.14, new Date());
}

CSV Default Formatting

Custom Type Handlers

CSVTargetConfig config = CSVTargetConfig.from(dataOutputStream)
    .withNumberTypeHandler((writer, value) -> writer.addField(String.format("%.1f", value.doubleValue())))
    .withDateTypeHandler((writer, value) -> writer.addField(new SimpleDateFormat("yyyy-MM-dd").format(value)))
    .withBooleanTypeHandler((writer, value) -> writer.addField(value ? "Y" : "N"));

Excel Target

ExcelTargetConfig config = ExcelTargetConfig.from(FileDataOutputStream.from("/data/out.xlsx", true))
    .withPrimarySheetName("Results")
    .withHeaders(List.of("Name", "Score", "Link"));

try (ExcelTarget target = ExcelTarget.withConfig(config)) {
    target.appendValue("John");
    target.appendValue(99.5);
    target.appendValue(new Link("https://example.com", "Example"));
    target.finishRow();
}

Multiple Sheets

try (ExcelTarget target = ExcelTarget.withConfig(config)) {
    // Write to first sheet (configured above)
    target.writeRow("John", 100);

    // Create additional sheets
    target.newSheet("Summary", List.of("Metric", "Value"));
    target.writeRow("Total", 500);
    target.writeRow("Average", 83.3);

    // Sheet without headers
    target.newSheet("Raw");
    target.writeRow("raw", "data");
}

Excel Default Formatting

Custom Cell Handlers

ExcelTargetConfig config = ExcelTargetConfig.from(dataOutputStream)
    .withNumberTypeHandler((cellRef, value) -> {
        cellRef.cell().setCellValue(value.doubleValue());
        // Access WorkbookHelper for custom styles
        CellStyle style = cellRef.workbookHelper().createOrGetStyle("custom", s -> {
            s.setDataFormat(cellRef.workbookHelper().getWorkbook().createDataFormat().getFormat("#,##0.00"));
        });
        cellRef.cell().setCellStyle(style);
    });

JSON Lines Target

JsonLineDataTargetConfig config = JsonLineDataTargetConfig.from(dataOutputStream);
try (JsonLinesDataTarget target = new JsonLinesDataTarget(config)) {
    Document doc = new Document("name", "John").append("age", 30);
    target.writeRecord(doc);
}

Compression

GZIP compression is handled automatically based on file extension.

// Reading compressed files
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithHeaders("/data/input.csv.gz");

// Writing compressed files
SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFile("/data/output.csv.gz");

Input/Output Streams

The library uses its own stream abstractions that carry metadata for format detection.

Reading

// From a file path
FileDataInputStream input = FileDataInputStream.from("/data/test.csv");

// From a raw InputStream (single use, no reset)
DataStreamMeta meta = DataStreamMeta.fromFileName("data.csv");
SingleUseDataInputStream input = SingleUseDataInputStream.from(inputStream, meta);

Writing

// To a file path
FileDataOutputStream output = FileDataOutputStream.from("/data/out.csv", true); // overwrite=true

// To a raw OutputStream
DataStreamMeta meta = DataStreamMeta.fromFileName("data.csv");
SingleUseDataOutputStream output = SingleUseDataOutputStream.from(outputStream, meta);

Data Library

Zulia Data Library

Gradle

Maven

Reading Data (Sources)

Auto-Detect Format

Header Options

From Input Streams

Record Access

CSV Source

CSV Options

TSV Source

Excel Source

Excel Options

Multiple Sheets

JSON Array Source

JSON Lines Source (JSONL)

Error Handling

Writing Data (Targets)

Auto-Detect Format

Without Headers

Custom Configuration

CSV Target

CSV Default Formatting

Custom Type Handlers

Excel Target

Multiple Sheets

Excel Default Formatting

Custom Cell Handlers

JSON Lines Target

Compression

Input/Output Streams

Reading

Writing

Email

Address