Tech Talk: Using Kafka at Drawbridge

Every week, Drawbridge holds “Tech Talks,” where team members give presentations on their current areas of focus. This Tech Talk was presented by Heedong and Sanjay.

At Drawbridge, ad request logs, impression logs, click logs, and conversion logs play a critical role. These raw logs are not only the source of our efficient programmatic ad serving, but are also fed into our reporting pipeline to provide accurate and up-to-date reports to our partners. However, recently our log aggregation infrastructure faced a challenge, as we are rapidly growing to serve about 10 billion requests per day. Servers responsible for aggregating log files from online servers and transferring to our Hadoop clusters were suffering high loads at peak times. Log transmissions were delayed as a result, and jobs down the pipeline were delayed as well, causing programmatic ad serving to be less effective and delaying reports.

Our first attempt to solve this problem was to add more machines to our current log aggregation infrastructure. However, we quickly realized that it would not scale, and it would entail cumbersome manual changes every time we add more machines. We then explored alternative frameworks like Kafka and flume. We decided to try Kafka not only because it is used by companies like LinkedIn, Twitter, and Netflix, but also because it provides fast, reliable, durable, and scalable framework.

kafkaKafka consists of three components – producer, broker, and consumer. In our case, online servers are producers that send log files to Kafka brokers every minute. Consumers sit on HDFS data nodes and pull logs from the brokers and directly write to HDFS. Producer and consumer programs were written by our team to meet the specific requirements we had, but the integration process was otherwise seamless. After the initial integration was done, we had to tailor some parameters to fine-tune Kafka. For example, we had to make the Kafka broker run with 24G of heap space, a different garbage-collection scheme, and 22G of java new size. This was due to out-of-memory issues we faced while running with small memory size. Messages are stored in memory until it is written to the disk, but since our message are large and sent from many online servers every minute, the initial memory size was not sufficient to hold messages until it’s written to the disk. We also added support for JMX for better monitoring. In addition we had to set the maximum message size allowed to be 300MB since our minute log file size is around 200MB at peak times. Below are the few settings that we changed.

kafka/bin/kafka-server-start.sh

  • export KAFKA_HEAP_OPTS=”-Xmx24G -Xms24G
  • export KAFKA_JVM_PERFORMANCE_OPTS=”-server
    • -XX:+UseCompressedOops -XX:+UseParNewGC -XX:NewSize=22G
    • -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark
    • -XX:+DisableExplicitGC -Djava.awt.headless=true”
  • export JMX_PORT=9111

kafka/config/server.properties

  • num.io.threads=8
  • socket.send.buffer.bytes=1048576
  • socket.receive.buffer.bytes=1048576
  • socket.request.max.bytes=314572800
  • message.max.bytes=314572800

While integrating Kafka into our platform, we had to make some design decisions. First of all, we had to decide where to run producers, brokers and consumers. It was intuitive to run producers on the online servers, since those servers generate logs. We decided to run brokers on dedicated machines with enough memory and disk space, since all the messages are stored in the broker machine. We didn’t want resources like memory, CPU and disk space to be shared with other programs which might interfere with the broker’s proper functionality. Consumers could have run on any machine, but we decided to run it on hadoop data nodes since the logs should be written to HDFS for consumption and data nodes are closest to HDFS.

Second, we decided to share zookeeper with Hadoop framework. We could have set up a separate and dedicated zookeeper cluster for Kafka, but Kafka supports zookeeper’s chroot capability allowing Kafka to specify its own zookeeper root directory to store information separately from others. Another decision we made was not to compress logs before sending. Compressed logs are smaller in size but it comes in the expense of CPU cycles. Online servers are processing a lot of requests per second, and we don’t want to overload the cpu and interfere with serving capability. We are able to deliver 200MB size logs within two minutes to HDFS, even without the compression.

The last (but not least) decision we made was the replication factor. We set it to 1, which means no replication. The reason behind this is that we would want producers to fail to send messages when broker goes down. When a broker goes down and it comes back up, it needs to copy missing logs from its replica. However, if the replica goes down before the first broker copies all the messages from the replica, the messages that are not copied will be lost (assuming the replication factor 2). We thought it would be safer to fail to send messages. Besides, it is not easy to retrieve a certain message from Kafka’s message queue.

Our log transfer time reduced greatly with Kafka. It takes a maximum two minutes to transfer logs from the online servers to the HDFS which is almost 10x better than our old way. Also, it is easier to scale. If brokers are overloaded, it’s less cumbersome to add a broker than adding more machines to our old infrastructure. We are in the early stages of using Kafka and we believe there’s a room for improvements. We will continue to improve our integration with Kafka and we will share the findings.

Share with friends!

Thank You, Summer Interns!

Our Summer interns are starting to leave us, and we want to thank each of them for their hard work and dedication this Summer. In their own words, here’s what our interns have been up to over the past few months:

Jay – Data Science Intern

Jay

“Interning on the Data Science team at Drawbridge has been an incredible experience. I’ve been able to delve into terabytes of data, using some of the most cutting-edge Big Data technologies. Within my first week, I was crafting user profiles – work that was already going into production. As I got familiar with the company and technology, I was given more personal autonomy to work on projects I found interesting. Perhaps the thing I valued most at Drawbridge however was how motivated and passionate everyone was about what they do. That culture has really contributed to both my personal and career development this Summer.”

Cindy - Software Engineer Intern

Cindy

“As a front-end software engineer intern, I worked on the front-end team with Len and Guanyu. Given the small nature of the team, I was essentially treated like a full-time employee, delving into the code base and implementing features on the front-end. Despite working mostly with the UI, I was also exposed to the back-end, allowing me to get a better grasp of the API I was working with. Languages and technologies I came into contact with include: Javascript, PHP, Java, HTML&CSS, D3, jQuery, Base.js, etc. If I had to choose just one thing I loved about my internship experience this Summer, it would be the level of respect and responsibility I was given despite the fact that I was an intern. I was able to work to solve real issues the company was having and never felt like I was any less than a full-time employee. That is something that I doubt I would’ve found at any other company.”

Lee – Platform Intern

Lee

“As a platform intern, I’ve worked on several different components of the Drawbridge platform. Some of my projects have included: a monitoring tool that touches all of our platform services, speeding up part of our reporting pipeline, and doing an experimental overhaul of our charge processing infrastructure. I also got my hands dirty with the occasional patch and bug-fix. I love learning from and working with all the intelligent and creative people at Drawbridge.”

 

Vin - Data Science Intern

vin“In my time at Drawbridge, I’ve designed, experimented with, and run algorithms on massive volumes of data, and I’ve gotten to see firsthand how learning theory segues into practice. I’ve been exposed to wonderful tools that I didn’t even know existed, and I’ve walked away with a strong sense of where the data science community is headed. As for my interaction with the people of Drawbridge, I’ll quote a saying among jazz musicians: ‘If you’re not the worst musician in your band, you need to find a new band.’ I might not have been the worst musician at Drawbridge, but I was likely the worst data scientist. I couldn’t be happier with the experience.”

Thanks again to all of our interns, and make sure to check out our careers page!

Share with friends!

Welcome, Brian Ferrario!

Brian FerrarioDrawbridge is excited to welcome Brian Ferrario as our new VP Marketing!

“I’m excited to join a company with all the assets a marketer could ask for – great technology, amazing venture partners, and a stellar executive team, including a rock star CEO,” says Brian. “I think there’s so much potential to build brand awareness and affinity in this space and rise above the countless companies that crowd the ad tech landscape. I’m looking forward to telling the world our story by bringing business-to-consumer-like sensibilities and keen authenticity to everything we say and do.”

Check out the full press release here!

Share with friends!

Five Tips for Effective Cross-Device Advertising: #3 – Optimize!

Drawbridge recently released the “Five Tips for Using Cross-Device Advertising Effectively” white paper, which detailed easy practices for maximizing the performance of cross-device campaigns. In this series of blog posts, we’ll dive deeper into each of the five tips.

Tip #3 – Optimize!

Cross-device optimization means two things to Drawbridge – optimizing creative to audiences across devices, and optimizing during the course of the campaign.

Optimize

Optimize to Audiences Across Devices

When creating cross-device ad campaigns, many people forget that their ad creatives will be seen across desktops, tablets, and smartphones – both in browsers and in apps. It’s important to tailor the content to each device given the different contexts of each platform. For example, content should be formatted and sized differently for each screen and operating system. In addition, app-store and landing pages should be optimized, as trends show an increase in CTRs for campaigns with optimized pages that are clean, complete, and informative.

Dynamic creative ad placements enable customized ads to be displayed based on specific products or locations. Machine-learning algorithms can automatically select optimal creative for each ad impression from an unlimited number of ad combinations to hit cross-device campaign goals.

Optimize Throughout the Campaign

Over the course of your campaign, a strong best-practice to follow is to always be optimizing. Focusing on key ROI metrics and continually using data to tweak strategy over the course of the campaign will help you meet your goals. Some features you can optimize include frequency (how many ads each person is served), time-of-day (when an ad is served), demographics (age, gender, location), platforms (iOS, Android, Mac, Windows) and even inventory from specific publishers. You and/or your campaign manager can also test various creative variations to see which image or text, or even video or animation, is working best and seeing the highest returns.

No matter what feature you optimize – either the creative and landing pages before the campaign, or the delivery features during the campaign – it is important to have a two-week testing period to gather data before seriously tweaking your strategy.

Share with friends!

Five Tips for Effective Cross-Device Advertising: #2 – Know Your Audience

Drawbridge recently released the Five Tips for Using Cross-Device Advertising Effectively white paper, which detailed easy practices for maximizing the performance of cross-device campaigns. In this series of blog posts, we’ll dive deeper into each of the five tips.

devices

Tip #2 – Know Your Audience

As we touched on in our previous tip, the ability to segment specific audiences based on demographics and interests is a powerful cross-device advertising tactic. Drawbridge can segment into over 8,000 groups, using first- and third-party data, including proprietary segments of our own.

Advertise to Someone, Not Something!

As a hypothetical, let’s say you want to run an ad campaign for designer iPhone accessories. A sample target audience for this campaign could be US-based women in their 30’s who own iPhones and shop designer brands. If we generously assume that 40% of devices where ads can be served are in the US, half the population is female, a quarter of the population in is their 30’s, and 1 in 50 people regularly purchase from designer brands – simple probability will tell us that only one in every 1,000 ads displayed will reach a US-based woman in her 30’s who is loyal to designer brands. But by working with advanced data segments, you could target an audience of 100% US-based women in their 30’s who buy designer brands, and then all of those ads will reach the proper audience.

Using data to identify user segments helps set up campaigns for success by beginning with the right target audience. Then you can expand your reach and improve the campaign performance by pinpointing high-value and high-engagement audiences. For example, perhaps you learn that in the campaign above, women on the West coast are reacting more favorably than women on the East coast. You could then shift the audience mix to target more West coast users, and/or upload different creative to raise East coast engagement.

Identifying an initial target audience with cross-device data, and continuing to learn about and adapt to user preferences during the course of your campaign is one of the most effective ways to increase campaign success. Drawbridge case studies point to these successful campaigns, including mobile CTRs well above the industry average, massive scaling while maintaining ROI, and quickly optimizing to raise conversion rates.

Share with friends!

Five Tips for Effective Cross-Device Advertising: #1 – Create a Cross-Device Strategy

Drawbridge recently released the Five Tips for Using Cross-Device Advertising Effectively white paper, which detailed easy practices for maximizing the performance of cross-device campaigns. In this series of blog posts, we’ll dive deeper into each of the five tips.


Tip #1 – Create a Cross-Device Strategy

A recent Google study indicated that 90% of consumers use multiple devices sequentially to complete a task over time, including online shopping and planning trips. Agencies and brand advertisers know this, and are increasingly adopting cross-device tactics to deliver more relevant ads to cross-platform audiences. The first step in any cross-device campaign is creating a solid strategy – going in blindly will not work for these advanced campaigns.

db_devices

What “Cross-Device” Is and Is Not

Cross-Device doesn’t just mean running a mobile campaign and a desktop campaign. While that could meet the requirements of a “multi-device” campaign, in this context, cross-device means leveraging data across devices. Whether using data from desktops to retarget users on their mobile devices, and vice-versa, or using mobile web data to target users within mobile apps – proper cross-device campaigns are targeting the same users across their devices, so cross-device identity is key.

The Strategy

When building a cross-device campaign, make sure you have thought out your goals and objectives and have an idea of how you will reach these goals. Is this a desktop-to-mobile campaign that will use the targeted audience’s desktop browsing to fuel mobile display ads? Or is your goal to reach users in an app based on actions from mobile web? Most cross-device technology providers offer specific user segments, such as women in their 30’s with iPhones, or Men with Android tablets who like to golf. These audience profiles are built with detailed cross-device data which lets advertisers build customized campaigns.

Work with your cross-device provider to determine your specific campaign strategies, but know that as long as you’re using cross-device data, your campaigns are already on track to be effective!

 

Share with friends!

SF Data Mining MeetUp: Data Science at Drawbridge

Last month, Drawbridge hosted a MeetUp for the SF Bay Area’s Data Mining group. Over 75 data scientists gathered at Drawbridge headquarters in San Mateo for a series of talks on data science at Drawbridge.

Sanjay reviewed our platform infrastructure, including our Hadoop cluster and recent move to Kafka. Xiang discussed our traffic allocation policies and process, followed by Nitin discussing user segmentation, and how we identify targeting characteristics. Our model training platform was reviewed by Albert, and dynamic predictive modeling was covered by Randy.

The Drawbridge team enjoyed welcoming everyone to our office and diving into our technology, and we can’t wait to host another MeetUp soon! Below are a few pics from this great event.

_MG_9709_300px_MG_9754_300px _MG_9750_300px _MG_9730_300px_MG_9747_300px_MG_9744_300px_MG_9726_300px_MG_9725_300px_MG_9718_300px_MG_9710_300px _MG_9697_300px

_MG_9754

Share with friends!

Mobile Ad Veteran Andy Miller Joins Drawbridge Board

Andy MillerThis week we announced the addition of Andy Miller to our Board of Directors. Mr. Miller is a current advisor and the former President and COO of Leap Motion, as well as a former Partner at Highland Capital. Previously, Mr. Miller was Vice President of Mobile Advertising at Apple, reporting directly to Steve Jobs. He was Co-Founder and CEO of Quattro Wireless, which was acquired by Apple in 2009 and would become Apple’s mobile advertising platform, iAd.

“Kamakshi and her team have built an incredible company based on very strong technology, and I look forward to contributing to its future direction,” said Mr. Miller. “With eMarketer reporting that 24% of Fortune 500 CMOs say that reaching consumers across digital touchpoints is their biggest challenge in 2014, the opportunities ahead are endless for this innovative, growing company.”

Read the full press release here!

Share with friends!

Pre-Roll Video Ads Now Supported Across Devices

Cross-Device Video

 

 

 

Yesterday, Drawbridge announced the ability for advertisers to leverage a supply of premium pre-roll video inventory via our integration with LiveRail, the leading video monetization platform for Publishers.

Pre-roll video is among the most effective digital video advertising formats, with completion rates of up to 81% according to a recent IAB study. The popularity and familiarity of pre-, mid-, and post-roll video ads likely stems from their similarity to traditional television advertising; advertisers can repurpose existing ads, and users are familiar with the medium. Drawbridge advertisers will be able to utilize proprietary data to segment audiences and increase relevancy for users, while increasing campaign performance across devices.

Read the full press release here.

Share with friends!

New Cross-Device Fact Sheet

Drawbridge Cross-Device Fact SheetToday Drawbridge released a fact sheet that shares some insights from our 1B+ device matches. In addition to determining that the average multi-device consumer has 2.5 devices (consisting of a combination of desktops, smartphones and tablets), the infographic has demographic trends, such as:

  • Roughly 60% of multi-device users are loyal to a single mobile platform, remaining with either iOS or Android for both their smartphones and tablets.
  • Women overwhelmingly own iOS devices, whereas men rely heavily on the Android platform.
  • Users on the East and West coasts use more iOS devices, while Midwesterners use more Android devices.
  • Millennials tend to mix platforms for smartphones and tablets, but Gen X and Baby Boomers choose iOS for their devices.

Download the fact sheet to learn more cross-device trends by age, location, and gender.

 

Share with friends!