Synthesized is able to connect to both on-prem and cloud data sources.
On-premise available data sources:
Cloud-based available data sources:
Yes, Synthesized supports on-prem installation, and can be easily deployed into MS Azure, AWS, and GCP private cloud.
One can interact with the Synthesized platform from the Web-UI, API, or SDK. When used from the Web-UI, the user will benefit from an easy to use and intuitive interface and enable team collaboration. The platform can be integrated with any external service using the API service, and it can be easily integrated with CI/CD processes with the SDK.
The solution supports user accounts and roles, and it can be integrated with an external single sign-on (SSO) service. It also provides full audit capabilities: all actions are written into a service table and can be queried using a REST API.
Synthetic data is data generated from a machine learning model that looks at the original data, learns and understands it, and is able to generate more data that looks, feels, tastes and even smells like original data.The new data has the same high level properties as the original, but at the row level it’s completely new and artificial.
Original data can be substituted for Synthetic data, and the same results will be achieved.But it has many other benefits, to name a few:
Synthesized can work with all structured data, usually — but not exclusively — this refers to tabular data, including flat files (such as Excel spreadsheets, CSVs, etc) and relational databases. All usual data types (integers, floats, characters, UUIDs, JSON) are handled by the platform.
For each Synthesized data product generated, Synthesized can automatically generate both a data utility and data privacy report that can also be stored, versioned, and documented.
Furthermore, different privacy reports and monitoring features are available on-demand, with an alert system that would notify the users if a specific scenario happens. This can be applied to privacy.
Running the engine against a given data source with the default configuration is straightforward, the user just needs to provide connection details for input and output sources, and that’s all!
If needed, the user can still provide some extra configuration parameters to enforce certain behaviour (e.g. strict rules or implicit referential integrity).
With traditional anonymization techniques each sample in the output data will have a one-to-one mapping with the original set, which means that they are not robust against complex attacks, such as linkage attack or attribute inference.
Synthesized approach, on the other hand, is to learn the data distribution of the underlying phenomena and sample new data points from it. This means that there’s no one-to-one mapping with the original data which makes it robust against complex disclosure attacks. Read more about the topic here.
Generally speaking, the chances of generating a data point that is present in the original set are insignificant. But if that happens, the Synthesizer can be set to remove those points from the output set.
Yes! Synthesized solution allows you to define and configure:
Synthesized supports two ways of attribute-level anonymization: