Data Library
Zulia Data Library
The zulia-data module is a standalone Java library for reading and writing CSV, TSV, Excel, and JSON data. It provides a unified API with format auto-detection, configurable type handling, and streaming support for large files.
Gradle
repositories {
mavenCentral()
maven {
url "https://maven.ascend-tech.us/repo/"
}
}
dependencies {
implementation 'io.zulia:zulia-data:4.10.0'
}
Maven
<dependency>
<groupId>io.zulia</groupId>
<artifactId>zulia-data</artifactId>
<version>4.10.0</version>
</dependency>
Reading Data (Sources)
Auto-Detect Format
The SpreadsheetSourceFactory auto-detects the format from the file extension and creates the appropriate source.
// Read with headers (CSV, TSV, or Excel based on extension)
try (SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithHeaders("/data/input.csv")) {
for (SpreadsheetRecord record : source) {
String name = record.getString("name");
Integer age = record.getInt("age");
}
}
Header Options
// Standard headers - allows duplicate and blank headers (renames duplicates with _2, _3 suffix)
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithHeaders(filePath);
// Strict headers - no duplicate or blank headers allowed
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithStrictHeaders(filePath);
// No headers - access by column index only
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithoutHeaders(filePath);
From Input Streams
// From a DataInputStream
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromStreamWithHeaders(dataInputStream);
// From a raw InputStream with metadata
DataStreamMeta meta = DataStreamMeta.fromFileName("data.xlsx");
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromSingleUseStreamWithHeaders(inputStream, meta);
Record Access
All record types support both name-based and index-based field access.
for (SpreadsheetRecord record : source) {
// By column name (requires headers)
String title = record.getString("title");
Integer year = record.getInt("pubYear");
Double score = record.getDouble("score");
Boolean active = record.getBoolean("active");
Date created = record.getDate("created");
// By column index (0-based)
String first = record.getString(0);
Integer second = record.getInt(1);
// With default values
int count = record.getInt("count", 0);
// Lists from delimited cell values (default delimiter: ;)
List<String> tags = record.getList("tags", String.class);
List<Integer> ids = record.getList("ids", Integer.class);
// Full row as string array
String[] row = record.getRow();
// Available headers
SequencedSet<String> headers = record.getHeaders();
}
CSV Source
FileDataInputStream input = FileDataInputStream.from("/data/test.csv");
CSVSourceConfig config = CSVSourceConfig.from(input)
.withHeaders()
.withDelimiter(',')
.withListDelimiter(';');
try (CSVSource source = CSVSource.withConfig(config)) {
for (CSVRecord record : source) {
String value = record.getString("column");
}
}
CSV Options
| Option | Default | Description |
|---|---|---|
withHeaders() |
no headers | Read first row as headers |
withStrictHeaders() |
no headers | Headers with no duplicates or blanks allowed |
withDelimiter(char) |
, |
Field delimiter |
withListDelimiter(char) |
; |
Delimiter for list values within a cell |
withBooleanParser(Function) |
true/t/yes/y/1 | Custom boolean parsing |
withDateParser(Function) |
ISO_DATE_TIME | Custom date parsing |
TSV Source
TSVSourceConfig config = TSVSourceConfig.from(input).withHeaders();
try (TSVSource source = TSVSource.withConfig(config)) {
for (TSVRecord record : source) {
String value = record.getString("column");
}
}
Same options as CSV except the delimiter is fixed to tab.
Excel Source
FileDataInputStream input = FileDataInputStream.from("/data/test.xlsx");
ExcelSourceConfig config = ExcelSourceConfig.from(input).withHeaders();
try (ExcelSource source = ExcelSource.withConfig(config)) {
for (ExcelRecord record : source) {
String value = record.getString("column");
// Access native POI objects if needed
Row nativeRow = record.getNativeRow();
Cell cell = record.getCell("column");
}
}
Excel Options
| Option | Default | Description |
|---|---|---|
withHeaders() |
no headers | Read first row as headers |
withStrictHeaders() |
no headers | Headers with no duplicates or blanks allowed |
withListDelimiter(char) |
; |
Delimiter for list values within a cell |
withExcelCellHandler(handler) |
DefaultExcelCellHandler | Custom cell type conversion |
setOpenHandling(handling) |
FIRST_SHEET | FIRST_SHEET or ACTIVE_SHEET |
Multiple Sheets
ExcelSourceConfig config = ExcelSourceConfig.from(input).withHeaders();
try (ExcelSource source = ExcelSource.withConfig(config)) {
// Read first sheet
for (ExcelRecord record : source) { ... }
// Switch to another sheet by index or name
source.switchSheet(1);
// or
source.switchSheet("Sheet2");
for (ExcelRecord record : source) { ... }
// Query sheet info
int count = config.getNumberOfSheets();
String name = config.getActiveSheetName();
}
JSON Array Source
Reads a JSON array of objects [{...}, {...}].
JsonArraySourceConfig config = JsonArraySourceConfig.from(dataInputStream);
try (JsonArraySource source = JsonArraySource.withConfig(config)) {
for (JsonSourceRecord record : source) {
String name = record.getString("name");
List<String> tags = record.getList("tags", String.class);
}
}
JSON Lines Source (JSONL)
Reads one JSON object per line.
JsonLineSourceConfig config = JsonLineSourceConfig.from(dataInputStream);
try (JsonLineDataSource source = JsonLineDataSource.withConfig(config)) {
for (JsonSourceRecord record : source) {
String name = record.getString("name");
}
}
Error Handling
// Log and skip bad lines instead of throwing
JsonLineSourceConfig config = JsonLineSourceConfig.from(dataInputStream)
.withExceptionHandler(new LoggingJsonLineParseExceptionHandler(log));
Writing Data (Targets)
Auto-Detect Format
The SpreadsheetTargetFactory auto-detects the output format from the file extension.
List<String> headers = List.of("name", "age", "active");
// Auto-detect format from extension (CSV, TSV, or XLSX)
try (SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFileWithHeaders("/data/out.csv", headers)) {
target.appendValue("John");
target.appendValue(30);
target.appendValue(true);
target.finishRow();
target.appendValue("Jane");
target.appendValue(25);
target.appendValue(false);
target.finishRow();
}
Without Headers
try (SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFile("/data/out.csv")) {
target.writeRow("John", 30, true);
target.writeRow("Jane", 25, false);
}
Custom Configuration
SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFileWithHeaders(
"/data/out.csv", true, headers,
config -> config.withListDelimiter('|')
);
CSV Target
CSVTargetConfig config = CSVTargetConfig.from(FileDataOutputStream.from("/data/out.csv", true))
.withHeaders(List.of("name", "score", "date"))
.withDelimiter(',')
.withListDelimiter(';');
try (CSVTarget target = CSVTarget.withConfig(config)) {
target.appendValue("John");
target.appendValue(3.14);
target.appendValue(new Date());
target.finishRow();
}
// Or with defaults
try (CSVTarget target = CSVTarget.withDefaultsFromFile("/data/out.csv", true, headers)) {
target.writeRow("John", 3.14, new Date());
}
CSV Default Formatting
| Type | Default Format | |——|—————| | Number (float/double) | 3 decimal places | | Date | ISO_DATE_TIME | | Boolean | “True” / “False” | | Collection | Joined with list delimiter | | Link | href value only |
Custom Type Handlers
CSVTargetConfig config = CSVTargetConfig.from(dataOutputStream)
.withNumberTypeHandler((writer, value) -> writer.addField(String.format("%.1f", value.doubleValue())))
.withDateTypeHandler((writer, value) -> writer.addField(new SimpleDateFormat("yyyy-MM-dd").format(value)))
.withBooleanTypeHandler((writer, value) -> writer.addField(value ? "Y" : "N"));
Excel Target
ExcelTargetConfig config = ExcelTargetConfig.from(FileDataOutputStream.from("/data/out.xlsx", true))
.withPrimarySheetName("Results")
.withHeaders(List.of("Name", "Score", "Link"));
try (ExcelTarget target = ExcelTarget.withConfig(config)) {
target.appendValue("John");
target.appendValue(99.5);
target.appendValue(new Link("https://example.com", "Example"));
target.finishRow();
}
Multiple Sheets
try (ExcelTarget target = ExcelTarget.withConfig(config)) {
// Write to first sheet (configured above)
target.writeRow("John", 100);
// Create additional sheets
target.newSheet("Summary", List.of("Metric", "Value"));
target.writeRow("Total", 500);
target.writeRow("Average", 83.3);
// Sheet without headers
target.newSheet("Raw");
target.writeRow("raw", "data");
}
Excel Default Formatting
| Type | Default Format | |——|—————| | String | Plain text (truncated at 32K chars) | | Number (float/double) | 0.00 format | | Date | m/d/yy | | Boolean | Native boolean cell | | Link | Blue underlined hyperlink | | Header | Bold, centered, 10pt | | Collection | Joined with list delimiter |
Custom Cell Handlers
ExcelTargetConfig config = ExcelTargetConfig.from(dataOutputStream)
.withNumberTypeHandler((cellRef, value) -> {
cellRef.cell().setCellValue(value.doubleValue());
// Access WorkbookHelper for custom styles
CellStyle style = cellRef.workbookHelper().createOrGetStyle("custom", s -> {
s.setDataFormat(cellRef.workbookHelper().getWorkbook().createDataFormat().getFormat("#,##0.00"));
});
cellRef.cell().setCellStyle(style);
});
JSON Lines Target
JsonLineDataTargetConfig config = JsonLineDataTargetConfig.from(dataOutputStream);
try (JsonLinesDataTarget target = new JsonLinesDataTarget(config)) {
Document doc = new Document("name", "John").append("age", 30);
target.writeRecord(doc);
}
Compression
GZIP compression is handled automatically based on file extension.
// Reading compressed files
SpreadsheetSource<?> source = SpreadsheetSourceFactory.fromFileWithHeaders("/data/input.csv.gz");
// Writing compressed files
SpreadsheetTarget<?, ?> target = SpreadsheetTargetFactory.fromFile("/data/output.csv.gz");
Input/Output Streams
The library uses its own stream abstractions that carry metadata for format detection.
Reading
// From a file path
FileDataInputStream input = FileDataInputStream.from("/data/test.csv");
// From a raw InputStream (single use, no reset)
DataStreamMeta meta = DataStreamMeta.fromFileName("data.csv");
SingleUseDataInputStream input = SingleUseDataInputStream.from(inputStream, meta);
Writing
// To a file path
FileDataOutputStream output = FileDataOutputStream.from("/data/out.csv", true); // overwrite=true
// To a raw OutputStream
DataStreamMeta meta = DataStreamMeta.fromFileName("data.csv");
SingleUseDataOutputStream output = SingleUseDataOutputStream.from(outputStream, meta);