Home > News content

20+ Twitter Open Source Software Featured

via:博客园     time:2016/7/7 13:01:01     readed:2649

You can see from the Twitter account of GitHub open source project open source Twitter has nearly 200 in the fields of distributed architecture, big data, asynchronous transfer network (client, server), Web, and other tools. Twitter can be called is built on open-source projects, the company responsible for Chris Aniszczyk source said that if there is no open-source software, Twitter would not exist, users send and receive in a mobile terminal and PC terminal each tweet will need open source software .

Typeahead.js & mdash; & mdash; automatic text completion jQuery plugin

This jQuery plugin comes from a new project Twitter to support remote and local data sets. More features is that you can use the data sets stored locally (local storage) to save locally, effectively improve the user experience. But also it has many treatment options for remote data collection, for example (request frequency, the maximum number of concurrent requests, etc.).

Key Features:

  • It supports data stored locally, client load, load speed optimization
  • Multi-language support, and support for Arabic
  • Support Hogan.js template engine integration
  • Supports multiple data sets assembled
  • It supports local and remote data collection

Source Address:

Twemoji & mdash; & mdash; Twitter's Emoji expression

TTwemoji is its complete open source Twitter Emoji emoticons. Developers can download the complete expression to GitHub repository, and these expressions into your own application or web page.


Source Address:

Hogan.js & mdash; & mdash; JS template engine

Hogan.js the Twitter team produced a parser for the mustache template. Hogan.js does not depend on any other libraries or frameworks, while ensuring the high efficiency of template parsing, and its volume is only 2.5K. Use it as part of your assets packaged templates compiled in advance or include it in your browser to handle dynamic templates.

Source Address:

Effective Scala & mdash; & mdash; Scala language

Scala is one of Twitter's main application programming language, most of the infrastructure is to use Scala to write, there are several large library package in support of the application, Scala is a large and efficient language should be used with caution in practice. It's a trap where the characteristics which we liked, in addition to what should pay attention to avoid? When the realization of & ldquo; purely functional style & rdquo; when, and pay attention to? Scala is mainly formed to create a large number of distributed systems and services.

Scala provides the tools needed to simplify the expression, reading less and less representative of typing, less reading representatives to read fast, simple and can increase the clarity (Road to SR). But simplicity is also a double-edged sword, which can cause the opposite effect, leading to a correct understanding of the reader is not enough.

Source Address:

& Mdash;; RPC framework Finagle & mdash

Finagle is a fault-tolerant protocol-independent RPC system for the JVM. Finagle use sbt build. Finagle from Twitter! It makes Java, Scala or any client and server heavy build robust JVM-based language is very easy. Finagle support broad-based request / reply protocol and the RPC protocol stream many types of

Use Finagle can quickly implement asynchronous remote method invocation RPC client and server side, the RPC itself is flexible enough to support a variety of variants, including a request response, streaming and pipeline mode, such as HTTP pipelines and pipeline Redis can also be easily stateful RPC run together, such as those that require authentication of RPC service.



Source Address:

FlockDB & mdash; & mdash; distributed graph database

FlockDB FIG stored as a set of edges, each edge is represented by two representatives of vertices 64-bit integer. For a social network diagram, these vertices ID is a user ID, but & ldquo; Collect & rdquo; Tweet this side of the goal vertex (destination id) is a tweets ID. Each edge are a 64-bit position identification information for sorting. (Twitter in & ldquo; concern & rdquo; class edge with a time stamp logo, so your followers list sorted by time, latest first.)

Source Address:

Snowflake & mdash; & mdash; distributed algorithm increment ID

Twitter in the storage system to migrate from MySQL to Cassandra in the process, because Cassandra is no order ID generation mechanism, then developed its own set of globally unique ID generated service: Snowflake. The advantages are: high performance, low latency; independent application; in chronological order. The disadvantage is: the need for separate development and deployment.

41 time series (accurate to the millisecond, you can use a length of 41 69 years);

10 machine identification (10 bits in length up to support the deployment of 1024 nodes);

12 counting sequence number (12-bit sequence number count support each node generates 4096 ID number per millisecond) the most significant bit is a sign bit is always 0.

Efficient and easily GUID generation algorithm, a int64_t field to be competent, unlike mainstream 128bit GUID of the algorithm, if not ensure strict ID sequential, but for specific services, such as using games server GUID generation will be very convenient . In addition, in a multi-threaded environment, the use of Atomic serial number can be effective in reducing the density of the lock code.

Source Address:

Diffy & mdash; & mdash; automated testing tools

Diffy is an open source automated testing tool that can automatically detect the Apache Thrift based or HTTP-based service. Use Diffy, need only a simple configuration, then you do not need to write test code.

Diffy mainly based on the stable version and its output copies of the release candidate of comparing output to check the candidate version is correct. Therefore, Diffy first release candidate should assume stable version & ldquo; similarity & rdquo; output. That is, whether stable release candidate version and system modules are the same, their final output should be & ldquo; similarity & rdquo; of. There has been use & ldquo; & rdquo ;, similar instead of using the same, because the same request may be some interference Diffy do not care about, such as:

  • The response contains a server-generated time stamp;
  • Code using a random number;
  • Competitive conditions between systems and services.

Source Address:

Scalding & mdash; & mdash; Scala library

Scalding is a Scala library simplifies Hadoop MapReduce job development. Based on Cascading Construction. Scalding similar with Pig, but provides tighter integration of Scala.

Hadoop is a statistical term (counting words) distributed systems.

Source Address:

Gizzard & mdash; & mdash; generic data segmentation middleware

Gizzard is Twitter in April 2011 launched a new universal data segmentation middleware, occupy an important role in the Twitter architecture. Twitter also announced Gizzard complete code. With Gizzard, startups and small companies can better handle large amounts of data faster, and thus fewer resources to meet customer needs. Gizzard main functions are as follows:

  • Support different underlying data store, Redis / Memcache / Mysql so supported, in principle, as long as the write operation idempotent (that is, write and order independent) you can support;
  • General data split support, support consistency hash, the primary key mod, defined in various ways since the split function, etc;
  • Through replication tree nodes to achieve different data backup mechanism;
  • Fault tolerance, after a machine problem, it will automatically save the updated delay queue, re-executed after the recovery, in order to ensure consistency;
  • Quick Migration.


Source Address:

& Mdash;; stream processing framework Summingbird & mdash

Summingbird streaming is MapReduce framework, a large-scale data processing system to support developers in batch mode (based on Hadoop / MapReduce) or streaming mode or mixed mode (that is, before a combination of both modes) (Storm-based) in a uniform manner code execution. It is based on Apache 2 license release for engineers to solve practical problems encountered in the use of existing methods:

  • Two different systems of two polymeric logic must keep pace;
  • Between each system and client, keys and values ​​must consistently be serialized;
  • Client is responsible for reading data from the two data storage, perform final polymerization and provide consolidated results.

Source Address:

Algebird & mdash; & mdash; Scala's abstract algebra tool

Algebird abstract algebra is used in Scala. These codes are mainly used to establish the polymerization system (via Scalding or Storm). Algebird associated with this component Summingbird: use some probabilistic algorithms HyperLogLog to increase computing speed.

Source Address:

Iago & mdash; & mdash; Web site load testing tool

Iago is a website load testing tool, Iago for a given site to access data recorded and synthesis flow. It differs from other load generation tool, it tries to maintain a constant request rate. For example, if you want to 100K per minute to request your services, Iago will try to maintain this speed test.

Source Address:

Heron & mdash; & mdash; real-time data analysis platform

May 25, 2016, Twitter announced Heron source. Heron's basic principles and methods: Real-time flow system is realized on the basis of a systematic analysis of large-scale data analysis. In addition, it needs: the ability to handle billions of events per minute, there is a delay in seconds, and predictable behavior; ensure the accuracy of data in case of failure, when it reaches peak traffic is resilient and easy to debug and shared simple deployment on the infrastructure.

To meet these needs, Twitter discussion of several options, including: expansion Storm, the use of other open-source systems, the development of a new platform. Because there is some demand for change Storm core architecture, so extend it requires a long development cycle. Other open origins of the process frame does not perfectly meet Twitter for size, throughput and latency requirements. Moreover, these systems are not compatible Storm API & mdash; & mdash; to adapt to a new API needs to be rewritten several topologies and modify advanced abstractions, which will lead to a long migration process. So, Twitter decided to establish a new system to meet the above mentioned requirements and compatible Storm API.

In Twitter, Heron as the main stream media system, running millions of development and production topologies. Since Heron efficiently use resources, after the migration Twitter all topologies, reducing overall hardware three times, resulting in Twitter foundation set efficiency has been significantly improved.

Source Address:

DistributedLog & mdash; & mdash; distributed logging Replication Service

DistributedLog (DL) is a high performance log replication service that provides persistent, strong consistency and replication features, which for building reliable distributed systems are essential, such as copying the state machine (replicated-state- machines), universal publish / subscribe systems, distributed database and distributed queue. DL will maintain records classification process sequences (sequences of records), and called Log (aka Log Stream), the record is written to the DL process called Log Writer, Log in and read from the records processed called Reader. DL advantages can be summarized as follows:

  • High performance: the face of a large number of concurrent log on persistable Writer DL can provide millisecond latency while a lot to deal with thousands of clients per second read and write operations;
  • Persistence and consistency: messages are persisted to disk, and store multiple copies of the form, so as to avoid loss. By strict order to ensure consistency between the Reader and Writer;
  • Various Workload: DL supports a variety of loads, including delay-sensitive online transaction processing (OLTP) applications (such as WAL and memory-based replication state machines distributed database), real-time calculations and flow extraction and analysis process;
  • Multi-tenant: the actual workload, DL is designed I / O isolation, to support large-scale multi-tenant log;
  • Layered architecture: DL has a modern layered design, it will have state of the storage layer and the layer of non-state service providers were isolated, enabling large memory expansion independent of the CPU and memory, support large-scale write fan -in and reading fan-out.

Source Address:

Ambrose & mdash; & mdash; visual monitoring system

Ambrose is an open source MapReduce visual surveillance systems Twitter released. It can monitor a Hadoop cluster (currently limited to Apache Pig) of MapReduce tasks. Ambrose plans to support:

  • Pig (realized)
  • Cascading
  • Scalding
  • Cascalog
  • Hive

Source Address:

SecureHeaders & mdash; & mdash; Web security tools

SecureHeaders is a gift to the Twitter Web developers, a Web sheep as fire safety tools, Secureheaders can automatically implement safety-related header rules, including the Content Security Policy (CSP), to prevent XSS, HSTS attacks against ( Firesheep) attacks and XFO click-jacking.

Source Address:

Activerecord-Reputation-System & mdash; & mdash; activity records reputation system

Activerecord-Reputation-System-based systems Rails developers, applications can automatically credit valuation based on the evaluation of the network, help developers find more information about the applications, the next step to guide decision-making. Twitter said, developers can be easily integrated in a Rails application of the system, or the system is isolated from the main application, in order to make better design.

The credit system is a network, a data network is updated according to the evaluation, and then calculate the value and reputation spread through the network. In this network, called direct value calculated according to the credit rating of the original credit (primary reputations), called indirect calculation of non-original credit (non-primary reputations).

AltAlt text

Source Address:

& Mdash;; SPDY framework CocoaSPDY & mdash

CocoaSPDY is oriented OS X (Cocoa) and iOS (Cocoa Touch) of SPDY framework, based on their previous contribution to Netty, while they updated its iOS application that uses SPDY instead of plain HTTP. Twitter has been noted that the communication delay reduced by up to 30%, when & ldquo; user network conditions worse & rdquo;, the improvement is more effective.

SPDY has another advantage: & ldquo; multiplexing request & rdquo; & mdash; & mdash; a continuously send requests in a single TCP session and receive capability out of order response from the server push messages to the client, as well as compression request and response headers.

Source Address:

& Mdash;; UI framework TwUI & mdash

TwUI is a UI framework to support Mac hardware acceleration:

  • Use CoreAnimation implement GPU acceleration;
  • Simple MVC development.

And UIKit different places:

  • Simplify the Table View unit;
  • Based on the layout and drawRect blocks;
  • A unified coordinate system (bottom left origin);
  • Sub-pixel text rendering.


Source Address:

Twemproxy & mdash; & mdash; proxy server

Twemproxy is a fast single-threaded agent support Memcached ASCII protocol and updated Redis protocol. It is written entirely in C, using the Apache 2.0 License authorization. The power of Twemproxy that it can be configured to disable the swap way node failure, while after a period of time to try again, or by using the specified key - & gt; server map. This means that when the Redis as a data store, which can Redis dataset fragment (disable the swap node expulsion); when the Redis as a cache, which enables nodes to achieve the expulsion of a simple high availability. Its characteristics are:

  • By way of proxy cache servers to reduce the number of connections;
  • Automatically share data across multiple cache servers;
  • Through different strategies and support consistency hash function hash;
  • Disable node failures by configuring the way;
  • Running on multiple instances, clients can connect to the first available proxy server;
  • Support requests streaming and batch processing, it is possible to reduce the consumption of back and forth;
  • high speed;
  • Lightweight.

Source Address:

Fatcache & mdash; & mdash; Cache Service

Fatcache can let you in on the SSD running memcached, you can use it as a large data cache. Some of its performance data are as follows:

  • Single node can handle 100,000 per set operation, each packet is 100 bytes;
  • Single node can handle 4.5k get operations per second, each 100 bytes of data;
  • 8 Fatcache instances can handle 32k get operations / sec to a single 600GB SSD storage;
  • You can connect multiple SSD to a single machine to improve IO performance.

Source Address:

AnomalyDetection & mdash; & mdash; automatically detect the time series outlier R package

AnomalyDetection R is a language package, Twitter will usually during major news and sporting events with AnomalyDetection scanning inbound traffic, found that those who use the robot zombie account unsolicited mass (marketing) information.

流量异常侦测流量异常侦测 figure_localglobal_anomalies

AnomalyDetection abnormal scan

According to Twitter reports, complementary relationship AnomalyDetection and Twitter last October open source BreakoutDetection.

Traffic anomaly detection for known & ldquo; Earth Pulse & rdquo; when Twitter is very challenging because of the traffic long period span (eg one year) scan analysis, some unusual activity tends to hide out. Moreover, the reasons for the abnormal flow also varies, some are healthy, such as major news events caused by traffic anomaly, and some bad, such as QPS (queries per second) in real-time point-in- time decline may mean hardware or data collection problem.

TwitterTwitter 流量异常侦测-长期 figure_longterm

Long cycle traffic anomaly detection

Source Address:

China IT News APP

Download China IT News APP

Please rate this news

The average score will be displayed after you score.

Post comment

Do not see clearly? Click for a new code.

User comments

Related news