anik/dataset — A PHP package to export DB tables data to CSV, import CSV data to DB tables.

Syed Sirajul Islam Anik
15 min readAug 12, 2020
By Mahmudul Hasan Shaon on https://unsplash.com/photos/QTPJWJBQO90

More than 3 years ago, I wrote a PHP package to read data from CSV and import that into the DB table. Upon finishing that, I wanted to do the vice versa. But didn’t pay that much attention. For the last 3 years, I tried rewriting that 2–3 times but couldn’t finish. This time, I started with full focus to completely rewrite the package and merge both the features into one package. Finally, today I finished both the features within 48 hours of continuous coding with 5.5 hours of sleep.

What does this package provide?

This package lets you insert records into multiple tables (if required) while reading data from CSV with less amount of coding required. On the other hand, this package lets you export data from multiple tables to a single CSV file. In both cases, you can compute on the fly and decide what can be the value that is not available in your data source but can be retrieved with a bit of computation on the available records.

Installation

To install the package, you will need to have the composer in your local machine. Then you can run the following code that will install the package in your project.

composer require "anik/dataset" "^3.0"

Requirements

The server should have PHP version >= 7.2. That’s all. The package depends on illuminate/database to fetch and push data to the database. And doctrine/inflector to do some guesses if you didn’t provide a few pieces of information. So, if you’re familiar with Laravel/Lumen’s Query Builder then it’ll be a plus for you. But, if you don’t know then I’d recommend having a look at the attached link. Most of the time the package does all types of querying without any headache. But if you need to customize, then you’ll require to write queries on your own. Otherwise, we’re all good. The package doesn’t force any developer to use it with Laravel, but if you use it with Laravel/Lumen, you’ll have some privilege.

How to use the package?

First, let’s split the usage. We’ll first cover how to import data from CSV to Database. Then we’ll cover how to export data from Database tables to CSVs.

Importing data from CSV to Database tables

To import data from your CSV file to database tables, you’ll need to write a class that extends the Dataset\CsvStorage class. For this article, let’s assume we’ll import the companies.csv file into our database table. So, our companies CSV file looks like the following.

name,image_url
Libero Morbi Foundation,http://placehold.it/350x150
Morbi Incorporated,http://placehold.it/350x150
Imperdiet Limited,http://placehold.it/350x150
Enim Sed Limited,http://placehold.it/350x150
Leo Vivamus Consulting,http://placehold.it/350x150
Feugiat Company,http://placehold.it/350x150
Lobortis Consulting,http://placehold.it/350x150
Nunc Pulvinar Incorporated,http://placehold.it/350x150
Dolor Tempus Non PC,http://placehold.it/350x150
Feugiat Tellus Lorem Company,http://placehold.it/350x150

The 1st line contains the header — the column names each field represents. And from lines 2 to 11, it contains data. And our class will be something like the following.

<?phpnamespace App\Readers;use Dataset\CsvStorage;
use Illuminate\Database\Connection;
use League\Csv\Reader;
use Throwable;
class Company extends CsvStorage
{
/*protected function raisedException (Throwable $t) : void {}

protected function exitOnError () : bool {}

protected function table () : string {}

public function db () : Connection {}
public function connection () : string {} protected function filterInput (array $record) : array {}

protected function useTransaction () : bool {}

protected function filename () : string {}

protected function fileOpenMode () : string {}

protected function streamFilters () : array {}

protected function headerOffset () : ?int {}

protected function mutation (array $record) : array {}

protected function entries () : array {}

protected function headers () : array {}
protected function delimiterCharacter () : string {} protected function enclosureCharacter () : string {} protected function escapeCharacter () : string {} protected function limit () : int {} protected function skipEmptyRecord () : bool {} protected function getReader () : Reader {} protected function type () : string {}*/
}

If you have a database table named companies that only has fields name and image_url then only the class declaration and (new Company())->import() will import all the rows in your database table. No more code write. As simple as this. 🍻

Explanations of the class methods of CsvStorage.

  • table — Dataset guesses your table name as a snake_case pluralized format. If you need to use a different table name, return that name from the table method.
  • connection — If you’re using outside Laravel, then you’ll have to return the connection name. When adding connections to your capsule manager, if no name is provided, then it’ll be default by default. If you provide any name, then you’ll have to return that connection name. When using with Laravel, either return any available connection name or config(‘database.default’).
  • filename — The class guesses that your CSV file’s location is in the same directory the extended class is. And the file name is like the table method, the snake_case pluralized format. So, if Company is the name of your class, then the file should be in the same directory named companies.csv. But if you want to use any different name for the file, or any different path then return the path for the file location.
  • fileOpenMode — The method returns a string representing the file open mode. Default r.
  • exitOnError — While inserting any data in your tables, if it catches any exception then should it continue or break the operation. true is used by default.
  • raisedException — The method receives the exception which was encountered throughout the code execution. Even if exitOnError returned true or false. A place where you may want to log the exceptions. This method is only called when trying to insert records into the database tables. Any exception before that is not handled.
  • useTransaction — If you want persistency in your database tables, then you can return a boolean value from this method denoting if you want to implement the database transaction or not. Default is false.
  • headerOffset — If your CSV file contains any header, then return the row position from this method. The default is 0 (zero). The first line contains the header. If there is no header, then return null. The header is not added to the tables.
  • streamFilters — If you want to apply any stream filters in your CSV, then you can return an array of filters from this method. Default is [].
  • headers — Even if your CSV file contains the header mentioned in headerOffset method, returning an array from here will overwrite those header values. Values don’t have any impact on these headers. Default is []. If your headerOffset method returned null, but you handle columns in an associative way, return an array from this method. The order will be maintained.
  • db — This method returns the Illuminate\Database\Capsule\Manager instance. If you want to place any manipulation on your Manager class, like setting connection and some other stuff, you can do that there and return the Manager. If you’re using with Laravel, you can return app(‘db.connection’) or app(‘db’)->connection($this->connection()).
  • mutation — The mutation method is the place where you add new values to your record row. Manipulate the existing record. Do calculations and insert extra columns in your record. Whatever you return from this method will take effect on your record. If you overwrite any existing value, it’ll be permanent. The $record array will be associative if your CSV has headers, or you passed it through the headers method.
  • filterInput — This method is the last place where you want to decide which columns are about to be committed to the table. You can either unset key from the array and return or return a set of fields from this method. After you return an array, only those records will be inserted in your table. The default is $record array is returned from. Whatever was fetched from the CSV will be inserted into the table.
  • entries — This method is responsible for inserting the record in your table. By default, it inserts the record that was filtered through the filterInput method, all of them will be inserted. This method should return an associative array where the keys represent the table names and value will be closure/anonymous function that will receive the parameters of Illuminate\Database\Eloquent\Model a model instance, array $record, and array $previous. The model instance will have the table and connection properties set. If you’re familiar with Eloquent’s created_at and updated_at then, it’s disabled by default. You’ll have to set those values on your own. Or enable it if your table requires so. You can play with the model. Next, the $record parameter contains the filtered input of your CSV row. Then the $previous array contains results of whatever was returned by the previous closures. The key on the $prevoius array is the table name and the value for that key is the result that was returned by the table’s handler closure. If you’re overriding the behavior of the entries method, you should make sure you’re saving the model. And whatever is returned from this method will be stored in the $prevoius array. Look at the example code if you want to put data into multiple tables. Or testMultipleTableEntriesCheckingDuplicateBeforeEntry test case if you want to use an update query on your model.
  • delimiterCharacter — The character that delimited the CSV file. Default is ,. Must be the length of 1.
  • enclosureCharacter — Character enclosure. Default . Must be the length of 1.
  • escapeCharacter — Escape character. Default is \. Must be the length of 1.
  • limit — Limit for each batch processing. Takes that no of rows in each batch. Default is 1000.
  • skipEmptyRecord — Skip the row if the record is empty. Default true.
  • getReader — The method should return the League\Csv\Reader instance. If you want to do anything before starting to read on your reader, it’s the method to work on.
  • type — Renaming the events. Check the event section below.

Events for CsvStorage

If you’re familiar with Laravel’s event system, you can use the events produced by the package. It produces a handful number of events. But when listening to events, if your listener returns (bool) false as a return value, then the execution will stop immediately. To enable events for the package, you can do any of the following.

// If you're using with Laravel
// Company is our class instance as above
// app() method returns the Laravel/Lumen application instance with
// Laravel/Lumen applications
// Either with Laravel
$instance = new Company(app());
// 1. Or outside Laravel
use Illuminate\Events\Dispatcher;
use Illuminate\Container\Container;
$container = new Container();
$disptcher = new Dispatcher($container);
$container->bind('events', function () use ($dispatcher) {
return $dispatcher;
});
$instance = new Company($container);// 2. Or outside laravel
$instance = new Company();
$instance->setEventDispatcher($dispatcher);

List of events:

  • dataset.reader.starting — When starting the import operation.
  • dataset.reader.preparing_reader — Before calling the getReader method.
  • dataset.reader.iteration.started — Before reading data from the source upon getting a successful reader.
  • dataset.reader.exception — In case an exception occurred.
  • dataset.reader.iteration.batch — When is about to process a batch.
  • dataset.reader.iteration.stopped — In the case of dataset.reader.iteration.batch event handlers returned false or exitOnError() method returns true;
  • dataset.reader.iteration.completed — When the execution has completed for either successful or unsuccessful execution.
  • dataset.reader.exiting — If any of the events returned false, throughout the execution, it immediately fires this event and stops execution.

NOTE: When overriding the type method of the explanation section, it’ll be used in the events name replacing reader in the above events.

Exporting data from Database tables to CSV

To export data from your database tables to CSV, you’ll need to write a class that extends the Dataset\DatabaseStorage class. For this article, let’s assume we’ll export the users table to users.csv. So, our users table looks like the following.

1 A 35 2020-08-01 21:05:45 2020-08-01 21:05:45
2 B 21 2020-08-01 21:05:46 2020-08-01 21:05:46
3 C 40 2020-08-01 21:05:46 2020-08-01 21:05:46
4 C 20 2020-08-01 21:05:46 2020-08-01 21:05:46
5 E 43 2020-08-01 21:05:46 2020-08-01 21:05:46
6 F 25 2020-08-01 21:05:46 2020-08-01 21:05:46
-- columns below
-- id, name, age, created_at, updated_at

And our class will be something like the following.

<?php

namespace App\Writers;

use Dataset\DatabaseStorage;
use Illuminate\Database\Connection;
use Illuminate\Database\Query\Builder;
use Illuminate\Database\Query\Expression;
use League\Csv\Writer;
use Closure;
use Throwable;

class User extends DatabaseStorage
{
/*public function db () : Connection {}

protected function condition () : Closure {}
protected function joins () : array {} protected function orderBy () : Expression {} protected function orderDirection () : string {} protected function columns () : array {} protected function headers () : array {} protected function limit () : int {} protected function fetchUsing () : string {} protected function mutation (array $record) : array {} protected function filterOutput (array $record) : array {} protected function exitOnError () : bool {} protected function getWriter () : Writer {} protected function getBuilder () : Builder {} protected function connection () : string {} protected function table () : string {} public function filename () : string {} protected function type () : string {} protected function fileOpenMode () : string {} protected function delimiterCharacter () : string {} protected function enclosureCharacter () : string {} protected function escapeCharacter () : string {} protected function raisedException (Throwable $t) : void {}*/
}

That’s all. If you have a database table named users then it’ll grab all the data from that table and put in the CSV file named users.csv in the class’s directory. Ain’t that simple?

Explanations of the class methods of DatabaseStorage

  • table — Dataset guesses your table name as a snake_case pluralized format. If you need to use a different table name, return that name from the table method.
  • connection — If you’re using outside Laravel, then you’ll have to return the connection name. When adding connections to your capsule manager, if no name is provided, then it’ll be default by default. If you provide any name, then you’ll have to return that connection name. When using with Laravel, either return any available connection name or config(‘database.default’).
  • filename — The class guesses that your CSV file’s location is in the same directory the extended class is. And the file name is like the table method, the snake_case pluralized format. So, if User is the name of your class, then the file it’ll output be in the same directory named users.csv. But if you want to use any different name for the file, or any different path then return the path for the file location.
  • fileOpenMode — The method returns a string representing the file open mode. Default w+.
  • exitOnError — While inserting any data in your CSV, if it catches any exception then should it continue or break the operation. true is used by default.
  • raisedException — The method receives the exception which was encountered throughout the code execution. Even if exitOnError returned true or false. A place where you may want to log the exceptions. This method is only called when trying to insert records into the CSV file. Any exception before that is not handled.
  • db — This method returns the Illuminate\Database\Capsule\Manager instance. If you want to place any manipulation on your Manager class, like setting connection and some other stuff, you can do that there and return the Manager. If you’re using with Laravel, you can return app(‘db.connection’) or app(‘db’)->connection($this->connection()).
  • mutation — The mutation method is the place where you add new values to your record row. Manipulate the existing record. Do calculations and insert extra columns in your record. Whatever you return from this method will take effect on your record. If you overwrite any existing value, it’ll be permanent. $record as the parameter is an associative array where keys contain the table fields and the values are the column values.
  • headers — If you want to add a header to your CSV file, then return a set of header values as an associative array. The key of the array denoting the table fields and the values denoting the CSV header’s column names. Default is []. Make sure you don’t return extra fields that are not available in the table column or won’t be added through the mutation method. This will write an unpredictable value in the CSV file. This method is also liable for writing which column values are going to be written to the file.
  • filterOutput — The method is called only if you returned an empty array from the headers method. This method is the last place where you can decide which fields to include in the CSV. If you return an empty array, then the row will be skipped. Receives the mutated record as $record. Returns the $record by default.
  • delimiterCharacter — The character that delimited the CSV file. Default is ,. Must be the length of 1.
  • enclosureCharacter — Character enclosure. Default . Must be the length of 1.
  • escapeCharacter — Escape character. Default is \. Must be the length of 1.
  • limit — Limit for each batch processing. Pulls that no of rows in each batch from table. Default is 5000.
  • getWriter — The method should return the League\Csv\Writer instance. If you want to do anything before starting to write on your file, it’s the method to work on.
  • type — Renaming the events. Check the event section below.
  • condition — The method is the place where you should write the business condition logic. Suppose, you want to pull users only for the last 30 days, this is the method where you should reflect that. This method should return a \Closure. An instance of Illuminate\Database\Query\Builder $query will be given to the closure as the parameter. Check the documentation if you don’t have a Laravel background. By default, it doesn’t use any condition.
  • orderBy — If you want to set an order for your result set, you can use this method. Must return Illuminate\Database\Query\Expression. By default, orders by id.
  • orderDirection — The direction you want to order your results. Must return string. Default is ASC. Can be DESC otherwise.
  • fetchUsing — Dataset provides you two ways of pulling data from tables. Either using database cursor or illuminate/database package’s chunk method. Can return any of them. Default is cursor.
  • joins — If you want to generate your CSV based on multiple tables, you can run joins on your tables. This method must return an array of arrays. Each element of the array containing values in this order. [next_table, first_table_join_on, operator, second_table_join_on, join_type, where_condition]. Element array can be associative too. Then array keys should have[table, first, operator, second, type, where]. Check the testJoinFunction test case for understanding it. key 'operator' or index 2 guesses = as default. key 'type' or index 4 guesses inner as default. key 'where' or index 5 guesses false as default.
  • columns — Must return an array containing valid table column names. Default is [‘*’] which denotes all fields. CAVEAT: The result returned by the illuminate/database is an array. So, if you’re fetching all fields then having two columns the same name will overwrite one another. Thus make sure you’re not overwriting the value by mentioning the column names with an alias. To alias a field, you can use Manager::connection($connection)->raw('field_name as new_name') or $this->db()->raw(‘field_name as new_name’). With Laravel, you can use app(‘db’)->raw(‘field_name as new_name’). You can also introduce extra values that are not present in the database. This method decides which fields are passed to mutate function.
  • getBuilder — Must return Illuminate\Database\Query\Builder instance. If you want to customize your select operations, joining, order by, and condition queries, you can skip the representing methods and just do it in this method. Make sure you don’t put the limit and offset in this method. It’s handled separately.

Events for DatabaseStorage

If you’re familiar with Laravel’s event system, you can use the events produced by the package. It produces a handful number of events. But when listening to events, if your listener returns (bool) false as a return value, then the execution will stop immediately. To enable events for the package, you can do any of the following.

// If you're using with Laravel
// Company is our class instance as above
// app() method returns the Laravel/Lumen application instance with
// Laravel/Lumen applications
// Either with Laravel
$instance = new User(app());
// 1. Or outside Laravel
use Illuminate\Events\Dispatcher;
use Illuminate\Container\Container;
$container = new Container();
$disptcher = new Dispatcher($container);
$container->bind('events', function () use ($dispatcher) {
return $dispatcher;
});
$instance = new User($container);// 2. Or outside laravel
$instance = new User();
$instance->setEventDispatcher($dispatcher);

List of events:

  • dataset.writer.starting — When starting the export operation.
  • dataset.writer.preparing_writer — Before calling the getWriter method.
  • dataset.writer.iteration.started — Before reading data from the source upon getting a successful writer.
  • dataset.writer.exception — In case an exception occurred.
  • dataset.writer.iteration.batch — When is about to process a batch.
  • dataset.writer.iteration.stopped — In the case of dataset.writer.iteration.batch event handlers returned false or exitOnError() method returns true;
  • dataset.writer.iteration.completed — When the execution has completed for either successful or unsuccessful execution.
  • dataset.writer.exiting — If any of the events returned false, throughout the execution, it immediately fires this event and stops execution.

NOTE: When overriding the type method of the explanation section, it’ll be used in the events name replacing writer in the above events.

Event values

The reader and writer events with values. Quoted string denotes it’s a string and as is. Bold denotes the event handlers value has an effect on execution. All the below events will have class as the first parameter when handling the event. Plus extra parameters are mentioned below.

--- Reader events ---
'dataset.reader.starting'
NOTHING_WILL_BE_GIVEN
'dataset.reader.preparing_reader'
'file': name_of_the_file_it_s_handling
'dataset.reader.exception'
'error': throwable_instance
'records': unprocessed_filtered_resultset_for_this_batch
'dataset.reader.iteration.stopped'
'uses': 'reader.source'
'dataset.reader.iteration.completed'
'uses': 'reader.source',
'completed': boolean_has_successfully_completed,
'dataset.reader.iteration.batch'
'batch': int_value_of_current_batch_no
'count': int_value_of_batch_record_count
'limit': int_value_of_limit_method
'dataset.reader.iteration.started'
'uses': 'reader.source'
'limit': int_value_of_limit_method
'dataset.reader.exiting'
'event': previous_event_on_which_handler_returned_false

--- Writer events ---
'dataset.writer.starting'
NOTHING_WILL_BE_GIVEN
'dataset.writer.preparing_writer'
'file': name_of_the_file_it_s_handling
'dataset.writer.exception'
'error': throwable_instance
'records': current_batch_unprocessed_record
'dataset.writer.iteration.stopped'
'uses': 'writer.cursor' // either
'uses': 'writer.chunk' // or
'dataset.writer.iteration.completed'
'uses': 'writer.cursor'
'uses': 'writer.chunk'
'completed': boolean_has_successfully_completed
'dataset.writer.iteration.batch'
'batch': int_value_of_current_batch_no
'count': int_value_of_batch_record_count
'limit': int_value_of_limit_method
'dataset.writer.iteration.started'
'uses': 'writer.chunk' // either
'uses': 'writer.cursor' // or
'limit': int_value_of_limit_method
'dataset.writer.exiting'
'event': previous_event_on_which_handler_returned_false

Hope it covers all the points of this package. Go through the examples directory or tests. It covers almost all the points. Don’t forget to put a star in the repository.

Happy coding. ❤

--

--

Syed Sirajul Islam Anik

procrastinator | programmer | !polyglot | What else 🙄 — Open to Remote