Frequently Asked Questions
Table of Contents
- How quickly will my streaming data be available in Athena?
- Can Upsolver handle large volumes of messages per second?
- Can I send messages from the same input to different output fields?
- Does creating different outputs from the same input require reading data source (e.g. Amazon Kinesis) multiple times?
- Does Upsolver store the original JSON messages on S3?
- Can I edit existing tables to reflect changes in the data?
- How does Upsolver output data to Athena?
- How do you ensure that Athena tables are efficiently queryable over time? Do you change the partition or files over time?
- Can Upsolver be used with Snowflake?
- What level of security do you offer?
- What kind of access control does Upsolver provide?
- Does Upsolver store any of my data?
- What is your support model?
- What is your pricing model?
- How does the system scale?
- How is the system managed?
How quickly will my streaming data be available in Athena?
Using Upsolver, your data should be available in Upsolver within 5 minutes of appearing in Kafka, and often in 2-3 minutes.
Can Upsolver handle large volumes of messages per second?
In the Upsolver architecture, storage and compute are decoupled. Upsolver handles increases in message volume by scaling out the compute cluster. You can choose a scaling strategy to keep consistent low latency.
Can I send messages from the same input to different output fields?
Yes, using the Upsolver UI you can define filters that will apply to that output, based on a field or fields from the event stream. For highly complex parsing you should use the Upsolver API.
Does creating different outputs from the same input require reading data source (e.g. Amazon Kinesis) multiple times?
You will only read the source a single time. Data is written to S3 and then distributed to multiple outputs. We do this because keeping a copy of the raw data on S3 is cheaper than Kinesis/Kafka, data retention can be longer and there is no risk that one output will cause slowdowns in other outputs.
Does Upsolver store the original JSON messages on S3?
The historical JSON files are batched together and kept in compressed Avro for higher performance and lower cost of storage. Access to historical data is available via the Replay feature.
Can I edit existing tables to reflect changes in the data?
Yes, Upsolver allows you to alter existing tables in Athena, including adding new columns to a table that’s already in use. These changes will take effect both proactively and retroactively, based on the timestamp you chose.
How does Upsolver output data to Athena?
Upsolver offers unique end-to-end integration with Amazon Athena. Tables are created via Glue Data Catalog, to which Upsolver will add:
- Optimization of S3 storage for performance. See this blog post for the details.
- Make data available in Athena in near real-time to Athena.
- Ability to define updatable tables in Athena(for CDC).
- Option to edit tables.
- Historical replay / time-travel.
How do you ensure that Athena tables are efficiently queryable over time? Do you change the partition or files over time?
Upsolver continuously optimizes your S3 storage to ensure high query performance in Athena. We start with 1-minute Parquet files (for latency reasons) and compact the files into bigger files for performance. Upsolver will keep the table data consistent using the Glue Data Catalog.
Can Upsolver be used with Snowflake?
Yes. For Snowflake, Upsolver will store the data on Amazon S3 and Snowflake will read from there.
What level of security do you offer?
Upsolver is as secure as your AWS account - it can be deployed in your VPC, which means that even Upsolver employees will not have access to the data. Alternately you can deploy on Upsolver’s VPC on AWS.
What kind of access control does Upsolver provide?
You can define read-only users in Upsolver, and grant/deny permissions to every object using a similar model to AWS IAM. You can also create separate workspaces to reduce complexity.
Does Upsolver store any of my data?
Data is only stored on your S3. We send provisioning data to the Upsolver cloud for cluster management and billing. Sending log data (that doesn't include any raw data) is recommended but optional.
What is your support model?
For ongoing support we are available via In-app chat, Slack and video calls as needed. We also provide 24X7 phone response for critical issues, based on agreed-upon metrics which we continually monitor.
What is your pricing model?
Compute based model (infrastructure-as-a-service). We charge based on Upsolver units: 1 hour of running of Upsolver on an instance with 8 CPUs. You can buy Upsolver Units on-demand or reserve them in advance, either directly from our Sales team or using Upsolver's AWS marketplace listing.
How does the system scale?
Storage and compute are decoupled. S3 is used for storage and EC2 Spot instances are used for compute. Scaling is linear since local disks are not used at all.
How is the system managed?
Upsolver manages the cluster remotely including troubleshooting, version updates, scaling and monitoring.
Get our technical whitepaper
Discover how Upsolver helps leading organizations manage and scale their cloud data lake infrastructure.
Download the guide now.
Why not try a demo?
See Upsolver in action and schedule a live demo presentation to see how we can prepare and deliver data at massive scale in a matter of minutes
Get a demo now.
Begin a free trial, no strings attached
If you're not sure about us, simply start a free trial. See how easy it can be to manage your data lake and prepare data streams for analysis with a free, fully-featured trial of Upsolver.
Begin your free trial.
Visit our big data blog
Keep up with data trends and learn more about the big data landscape through our Upstream blog.
Check out our data blog!